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INTELLIGENCE EXAMINATION DELTA 2 
M. E. HAGGERTY 


University of Minnesota 


The primary purpose of this paper is to present a revised table of 
age norms for the Haggerty Intelligence Examination, Delta 2, for 
which there has been a constant and persistent demand since the 
initial publication of the test. Advantage will be taken of the occa- 
sion to present certain pertinent facts derived from the use of the 
examination during the two years since it first became available for 
general use. 

The age norms by years and months are given in Table I. A 
mental growth curve based on these norms is shown in Figure 1. 
These norms are based on the results of the examination of more than 
40,000 individuals ranging from Grade III of the elementary school to 
the second year of college. The norms are not exact medians for any 
particular group nor for all combined. There is increasing evidence 
that groups of exactly the same median chronological age will differ 
as much as a full year in mental age, as measured by examinations of 
the Delta 2 type. Thus, 12-year-old pupils in the 1-teacher schools of 
New York State score 75, whereas pupils of the same chronological 
age in the larger rural schools of New York score 93, a difference of 
18 points. These two groups are fairly large, 446 and 656 individuals 
respectively and there is high presumption that these 12-year-olds 
are relatively unselected groups from their several communities. 
Within the same city system two Grades VIII’s of approximately the 
same chronological ages, may differ as much as 2 full years in mental 
development, in terms of this test. Any age norm, therefore, based 
upon any particular group of individuals, will be inaccurate, the 
amount of inaccuracy depending upon the degree of selection repre- 
sented. Until someone devises a testing program which obviates the 
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Tasie I.—HaaGcerty INTELLIGENCE EXAMINATION, De.tTa 2! 











Ages Months 
in —— * a - 
years 1 2 3 4 5 6 7 8 9 10 | il 
7 7 8 9/ 10; 11); 12) 13] 15] 16) 17) 18] 19 


8 20; 22| 24] 25] 27; 29/] 31] 33] 35]! 37] 38] 40 
9 42} 43| 45! 46:| 47| 49| 50!] 51] 53| 54] 55| 57 
10 58 | 59| 60; 61] 62| 63| 64] 65| 66/| 67] 68| 69 
ll 70| 71| 72! 73| 74| 75| 76) 77| 78| 79] 80| 81 
12 82, 83} 84! 85] 86; 87| 88; 89] 90| 91; 92] 93 
13 94| 95| 96] 97) 98] 99) 100| 100 | 101 | 102 | 103 | 104 
14 | 105 | 106 | 107 | 108 | 109 | 110 | 110 | 111 | 112 | 113 | 114} 115 
15 | 116 | 117 | 118 | 118 | 119 | 120 ; 121 | 121 | 122 ; 123 | 124] 124 
16 | 125 | 126 | 126 | 127 | 127 | 128 | 128 | 129 | 130 | 130 | 131 | 131 
17 | 132 | 132 | 133 | 133 | 134 | 134 | 135 | 135 | 135 | 136 | 136 | 137 
18 | 137 | 137 | 138 | 138 | 138 | 139 | 139 | 139 | 140 | 140 | 140} 141 
19 | 141 | 141 | 142 | 142 | 142 | 142 | 143 | 143 | 143 | 143 | 144 | 144 
20 | 144 | 


1 Age norms for individuals of ages 7 to 20 years—based on about 40,000 cases. 
Figures in first column opposite years indicate normal scores for individuals of 
even ages. Figures in succeeding columns to right indicate normal scores for 
months beyond even ages. 









































factor of selection the most satisfactory age norms will be obtained by 
inference and construction, based upon the differential data from 
multiple groups. No statistical procedure as yet proposed avoids the 
necessity of some personal judgment in fixing age norms. 

The function of such age norms is to serve as points of reference 
for the scores made by children to be examined after the norms are 
fixed. It is not necessary, for practical purposes, that such points 
of reference represent with absolute exactness the median quality of a 
perfectly unselected group of individuals. That the norms shall 
approximate such scores within the range of the probable error theoret- 
ically true for a genuinely unselected group of persons of each chrono- 
logical age considered, and that they remain constant, is all that the 
practical uses of a table of norms demands. 

How well the norms of Table I meet this practical criterion will be 
evident in Table II which gives the intelligence quotients—figured 
in the usual way—for approximately 1000 children whose scores were 
not considered in the 40,000 cases from which Table I was constructed. 
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The median intelligence quotient here is found to be 98.3. It was the 
belief of Dr. Miller who made the tests and accumulated a large amount 
of information from Achievement and other intelligence tests on this 


10 


0 


Fig. 1—Intelligence Examination, Delta 2. Mental growth curve. Figures’on left 
ordinate indicate score. Figures on base line indicate chronological age. 


1000 pupils, that the median IQ should be slightly less than 100 which 
would be the theoretically correct score for a perfectly normal group. 
While most of the cases were of the type to be found in the average 
American community, there was a considerable sprinkling of foreign 
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children in the group, many of whom had language difficulties which 
probably made for lower scores in the Delta 2 test. 


TaseE II.—Distrrisution oF INTELLIGENCE QuoTIENTs, AUSTIN PusLic ScHooLs 
































Elementary Junior high 
1Q | exe 
3B |3A|4B|4A| 5B oat ia 64|7B|74|8B/8A| 9 z 
PST Ge ieee Weta oes Wis oo | Se: Ges (OPS ERS: SARK ibe. 385s ae 
B1- 55 
56- 60 | ...| . 1... ee i 
61-65 | 2\. eS ae nS Sade Sy ae ie 7 
66-70! 2) 2| 1 2! 3] 1] at ad af al 4 17 
Sree) Sat ee ot) ie ee ed 2} 46 
7-80| 8| 2) 9 8| 5] 4] af 5t af 2] a4}... al 55 
si-85| 8 6| 12:10/11/ 7| 2| of si 4]. 3} 8| el gs 
s6- 90 | 12/11| 14 4|10| 6| 4)10] 4/10! 3) 5\ 15) 108 
91-95; 14 3| 191 6;111;10| 7 9f si 5{ 3i 8 15] 118 
96-100 | 18| 8| 16 6| 7] 8] 8| 5] 10\ 6] 11) 31 121 118 
101-105 | 12) 7/ 13} 6| 9| 7] 7 5] 4i3i 7] 12) 4) 49) 118 
106-110 8| 3|} 71 4] 6| 6] 7 5] 15 6] 7 4} a2 90 
111-115} 2 1} 3| 3} 5| 4! 5 4) sl 3] 10 4} 44! 63 
116-120| 31! 5 1/ 71] 2! 3 6f al al sl al al 56 
121-125 | 4 . 2} 1) 2} 2] o 1 4 5] al al-aal 43 
126-130 | 1 2 “2 le eee es ee a 3} 11 3] 23 
131-135 S14 a8 4 3| . 3} 14 
<< 8 BER OSE 58 BS ee Se ee, aes ae ee 
et we al at. as 4 11 51 18 
Totals....| 104} 51 | 115] 62 | 82 | 60 | 66| 69 | 84| 50| 74| 43! 138] 998 
Medians. .| 96| 90 93/ 86 | 95 | 98 | 104| 95 | 104! 99 | 106 101| 107/98.3 















































On page 261 are given the distributions for approximately 3600 
pupils, in terms of their chronological ages, and of their mental ages 
as measured by the Delta 2 test. These children are elementary 
pupils found in the larger rural schools of New York state.! Histo- 
grams showing the distribution of the entire group in mental and 
chronological ages are given as Figures 2 and 3. 


1 New York “rural” schools include towns of 4500 population and less. The 
3600 cases are from schools having four and more teachers. 
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Fic. 2.—Intelligence Examinations, Delta 2. Four-teacher elementary schools. 
Grades IIIto V. Distribution by ages in terms of chronological and mental ages. Solid 
line represents chronological age, broken line, mental age. 3675 cases. 
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Fie. 3.—Intelligence Examinations, Delta2. Four-teacher elementary schools. Grades 
VI to VIII. Distribution by chronological and mental ages. Solid line represents chron- 


ological age; broken line, mental age. 
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TasBLe III.—INTELLIGENCE EXAMINATION, DELTA 2 























Grade Grade Grade Grade Grade Grade 
III IV V VI VII VIII 
. Years 
CA|MA|CA| MA|CA|MA|CA!| MA!|CA| MA/CA! MA 
7 21; 42] 1 81)....) 7 
8 142) 190 | 13} 142] 1) 59]....] 16]....] 3 
9 140| 104 | 235] 204 | 27] 105|....) 35]....| 8|....) 2 
10 64, 45 | 195] 154] 170/110} 29) 74] 5] 23/]....] 3 
11 24, 20 145) 90 207] 136 | 190/ 108 | 24! 57)|....| 16 
12 14, 9] 70} 70 | 122] 110 | 233) 162 | 166] 88| 43) 51 
13 6| 2| 47| 29] 78] 78 185] 127 | 170) 113 | 138} 94 
14 ...[... | 171 6] 461 42] 85] 101 | 139] 126 | 208] 125 
15 Rigas 44 2] 17| 12] 82) 46! 70| 77| 112) 97 
16 1 1} 6] 8s 23} 12) 50| 47] 70 
17 4} 1) ll 1} 21] 14] 38 
18 5 9/ 4! 30 
19 5 6 15 
20 6 25 
Totals....... 412| 412 | 728] 728 | 669] 669 | 713] 713 | 587] 587 | 566] 566 
Medians... . 9.2} 8.8 |10.5| 9.9 |11.6) 11.3/12.5| 12.7|13.5| 14 |14.4) 14.8 
Average de- 
viation....| .9} .6| .9] 1.4] 1.0/1.5] .8 ; .91 1.6 | .8| 1.7 






































1 Four-teacher elementary schools. GradesIII-VIII. Age-grade distribution 
in terms of chronological and mental ages. Medians and average deviations given 
for both chronological and mental ages in each grade. 


Some CONSIDERATIONS EMPLOYED IN DETERMINING THE NORM 


For the determination of the norms, as given in Table I, the 
writer had available the results from several large groups of public 
school children. The major ones employed were as follows: 6184 
rural white pupils in 1-, 2-, 3- and 4-teacher schools, in the state of 
Virginia; 3541 city white pupils, in the state of Virginia; 2323 white 
pupils from the cities of Aberdeen, Baltimore, Cleveland, Evansville, 
Indianapolis, Louisville, Rochester, and Santa Anna; 3755 pupils 
in 4-teacher rural schools, in the state of New York; 3423 pupils from 
1-, 2 and 3-teacher rural schools, in the state of New York. All 








<> opie 


ene 


pein 


ee 


264 The Journal of Educational Psychology 


of these groups were distributed in about the normal proportion from 
Grades III—-VII (Virginia) and Grades III-VIII. There were also 
available about 12,000 cases from elementary schools, furnished the 
writer by the publisher from returns made by purchasers of the test.’ 
In addition there was considerable data reported directly by users of 
the test to the writer. Thus, there were 1000 Grade VIII cases from 
tests given at one promotion period to all pupils finishing the eighth 
grade in the city of Minneapolis, similar material from the city of St. 
Paul, and from a large number of smaller cities throughout the country. 
For high school students there were results from 1300 children, in 
Grades IX-XII inclusive, from the New York survey. Similar 
data, for approximately 1000 Grade IX children from the Virginia 
survey, and a large amount of similar data was furnished the writer 
by users of the test in Wisconsin, Minnesota, Colorado, California 
and elsewhere. This material came from both large and small high 
schools. 

In all of the foregoing data there were available not only medians, 
but complete distributions. Median scores for grades and ages have 
also been furnished in considerable numbers from elementary school, 
high school, and college and normal school groups. 


Mertuops Usep IN DETERMINING NormMs 


The basic method used in determining norms was to construct 
distribution tables for each of the several groups in terms of chrono- 
logica ages. The median scores for the several age groups were then 
compared and a tentative table of age norms based upon these medians 
was constructed. The further work consisted in adjusting these age 
norms in the light of further considerations. 

One method of making such adjustment was to select from a large 
group of 3000 or more students those pupils who were of normal 
chronological ages for the grades in which they were found. Median 
age scores for one such group are to be found in Table IV. A compari- 
son of the tentative table showed that the medians for any age group 
were not indentical with the medians for children of the proper chrono- 
logical ages for the grades. The second method of readjustment was 
to study the progress which typical groups showed from one chrono- 
logical age to the next and a similar type of comparison for the progress 





1 These data were less valuable than desired, owing to the inability to estimate 
in any form the amount of selection represented in most of the returns. 
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covering 2-year intervals (see Table V). A third method was the 
construction of a mental-age-grade distribution of the type shown in 


TaBLeE I1V.—Detta 2 MepIAN Scores ror Pupits or NoRMAL CHRONOLOGICAL 
AGES FOR THE GRADES IN WuicuH THEY ARE FouND 


GHA HPS 5 OSS, RS e2 S Se See 8 > 1th 
Chronological age........... ® 8.11 @. 8. 6. Bb. B® v7... 38 
See ide SH. a bas hee 43 58 75 92 99 118 123 132 137 143 


TaBLE V.—INTELLIGENCE EXAMINATION, DetTa 2. INTERAGE Steps STaTeD 
IN TERMS OF Test SCORES 


Two-year Age Intervals 





DORR GR 28s os cs eins ost 7-9/8-10|9—11)/ 10-12) 11-13) 12-14) 13-15) 14-16} 15-17 | 16-18) 17—19) 18-20 





Age norms as given in 


Manual of Directions...| ..| 30 | 23 22 21 23 28 
New } 1-teacherschools..| ..| 18 | 33 27 13 1l 2 3 
York | 4-teacherschools..| ..| 27 | 21 25 20 22 26 17 17 0 


Tabulations of 12,000 cases| ..| 24 | 16 18 40 30 2 9 17 2 
New norms from Table I..| 35) 38 | 28 24 24 23 | , 22 20 16 12 9 7 









































TaBLE VI.—INTELLIGENCE EXAMINATION, DELTA 2! 








Ces ieks eds. cc Ill} 1IV| V | VI} VIL | VIII] 1X | X | XI} XII 
Score....................| 89 | 57 | 75 | 91 | 104 | 115 | 125) 135) 136) 141 
ee 9-—2|10—6/11—7/12—6) 13-6) 14~-5)15-1)16—4/17-1/17-11 
Corresponding age score...| 45 | 64 | 77 | 88 | 100 | 110 | 117) 127| 132) 137 



































1 Four or more teacher schools of New York. Grades III—XII Median score 
and median age with score corresponding to that age, for each grade. 


Tabe III. Still another method is represented in Table VI. This 
table gives the median scores and median ages for a group of New York 
4-teacher elementary schools and 4- and more-teacher highschools. In 
the third horizontal row of this is given the intelligence score called for 
by the median ages of the pupils. Thus, in Grade V the median score 
for the group is 75. The median age is 11-7 and the corresponding age 
score called for by this median age is 77. In this case the actual 
median score and the indicated score are not identical, nor are they 
identical in any one of the grade groups. The difference in the case of 
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Grade IX is 8 points which is almost as great as the difference between 
the actual median scores of Grades VIII and IX. These differences 
represent in a degree not calculable from our data, the amount of selec- 
tion taking place in the several grade groups. 

By successive readjustment of the figures of the first tentative 
table in the light of considerations of the type represented by these 
several forms in which the data were developed, the final norms as 
represented in Table I were finally fixed upon. 

The norms for the extreme ages represented in Table I call for a 
further word of explanation. The Delta 2 Examination is probably 
not a very satisfactory measure of intelligence below the age of nine 
years. Norms for seven years and eight years are given, not as 
indicating that the test should in general be used for these lower ages, 
but in order to give some ralative value to the lowscores which are made 
by children of nine and higher chronological ages. While the degree of 
exactness characteristic of the other portions of the table are absent 
from these lowest age norms, they will serve a genuine purpose in 
indicating something regarding the mental quality of older children 
who make these lower scores in the test. 

While it is customary in stating age norms for group intelligence 
tests to assume a negative acceleration in the mental growth curve 
for ages beyond 14, and a cessation of mental development at 16 years, 
the results of the Delta 2 Examination indicate the desirability of 
stating age norms for chronological ages beyond sixteen. Almost 
without exception the reports from high schools indicate an increase of 
intelligence scores with each successive school grade (see Table IV). 
Reports from colleges and normal schools indicate a still higher level 
in many if not all cases. One obvious explanation for this increase of 
scores is the operation of the selective function of the school program 
which eliminates, year by year, the less intelligent pupils from the 
schools. It has not, however, been proved that those children who 
remain throughout the high school do not actually improve in their 
ability to make scores on group intelligence tests of the type of Delta 2. 
There is at least a fair assumption that just this increase of ability 
does take place, and that the actual increase of scores found from 
Grade IX to Grades XI, XII and XIII is only in part due to the factor 
of selection and in part due to actual mental growth. The construc- 
tion of age norms, therefore, from 16 to 20 will serve a useful purpose in 
giving the means by which these children of the higher ages may be 
rated for relative intelligence. 
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If it should ultimately be shown that the whole increase in median 
scores from grade to grade, and age to age, is due to ‘he operation of 
selection, then the age norms would offend only by giving these older 
and more intelligent pupils a relatively lower rating in terms of 
intelligence quotients than they should actually have. If on the other 
hand it should ultimately be shown that the increased score is due to 
actual mental growth, then there is no offense except that of inaccurately 
measuring the amount of this growth. If, as the writer suspects, 
the truth lies between these extremes these upper age norms will 
still provide a more accurate measure of intelligence than if the growth 
curve should become horizontal at 16 years. 


Facts DERIVED FROM THE USE OF THE DELTA 2 EXAMINATION 


It may be worth while in connection with the publication of the 
new age norms to add certain data bearing upon the usefulness of 
the test as a measure of intelligence and as a basis for predicting 
school success. 

The crucial question is this: Do children who score highest achieve 
in school work the same relative standing that they do in the intelli- 
gence examination? Doubtless the most accurate method of deter- 
mining this relationship is by calculating coefficients of correlation 
between the scores of the test and other measures of ability and success. 
Such coefficients will be given in considerable numbers. First, 
however, we may use the simpler method of decile comparison. 


DEcILE GROUPS 


The data represented in Table VII and in Figure 4 are derived 
from tests on 200 unselected Grade VIII pupils. The figures in the 
first column of the table number the successive deciles based on the 
Delta 2 test; the figures of the second column are the median Delta 
2 scores for the several decile groups; and the figures of column 3, 
are the summated scores for the several decile groups in the following 
tests: Silent Reading (Haggerty Sigma 3, Form B); Spelling (Ayres- 
Breed); Addition and Multiplication (Woody); History, Information 
and Thought (Van Wagenen); and Arithmetical Problems (Delta 2, 
Exercise 2). The crossed bar in Figure 4 represents the decile intelli- 
gence; the black bar shows the decile achievement. There is evident 


here a regular increase in achievement comparable to the increase 
of intelligence scores. 
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TaBLeE VII.—MeEpDIANS IN ToTaL Scores or ACHIEVEMENT TEsTs FOR Eacu 
DeciteE Group 1N INTELLIGENCE EXAMINATION DELTA 2} 








Decile groups Delta 2 Total achievement T 
whic! 
2 96.5 176.5 
3 103.5 194 
4 109.5 193 a 
5 113.5 204.5 
6 118 210.5 ry 
7 122.5 222 | 
8 127.5 229.5 rT 
9 134 246 = 
10 144.5 268 .5 











1 Two hundred cases being all Grade VIII pupils tested with all tests in Erie 
County, New York. a 
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Fie. 4.—Comparison for each decile group in Intelligence Examination, Delta 2, be- ra’ 
tween median total achievement scores and median. Delta 2 scores. Two hundred 3 
cases being all Grade VIII pupils tested with all tests in Erie County, New York. 
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COEFFICIENTS OF CORRELATION 


The correlation method of evaluation is represented in Figure 5 
which shows the relation existing between the scores of the Delta 2 
test and the criterion scores for 232 12-year-olds in the schools of 
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Westchester County, New York. The criterion score in this case was At 
composed of three items: The grade location of the pupil, his teacher’s oe 
rating for scholarship and the scores which he made in the Haggerty ; t My: 
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Reading Examination, Sigma 3. In combining these items, the grade 
location was multiplied by 10 and the teacher’s ratings for scholarship, 
given in Figures 5, 4, 3, 2, and 1 were equated to the following 
numbers 9, 7, 5,3, and1. The raw scores in the reading examination 
were used. The maximum score possible from this combination was 
238 points. The actual maximum was 205 and the median score for 
the group was 130. 


The relation represented in Figure 5 will be clear from the following 
description. 

“The numbers along the base line represent the criterion score. 
The heavy horizontal line across the middle of the figure indicates the 
median score (96) in the intelligence examination, Delta 2. The hori- 
zontal line next above (+1Q) is placd at a distance from the median, 
which is equivalent to the semi-interquartile range (Q) of the scores 
in the Delta 2 examination. The second horizontal line (+2Q) above 
the median is placed at twice the distance of the semi-interquartile 
range above the median. Similarly, +3Q represents 3 times this 
measure of variation. In like manner the horizontal lines IQ, —2Q, 
and —3Q represent corresponding distances below the median. * 

The heavy vertical line (M) represents the median criterion score 
(130). The lines +1Q, +2Q, and +3Q represent distances above 
the median, of the criterion score equivalent to 1, 2, and 3 times the 
semi-quartile range (Q) of the criterion scores of the 232 children. The 
vertical lines —1Q, —2Q, and —3Q represent similar distances below 
the same median.” | 

The dots in the figure represent individual children whose criterion 
score may be obtained by locating on the base line the vertical for 
each dot and whose test score is shown on the ordinate at the left. 

‘All of the dots inclosed within the two diagonal lines represent 
children who do not differ in their relative standing in one test from 
their relative standing in the other test by an amount greater than the 
semi-interquartile range in either test. The children represented by 
the dots outside the diagonal lines represent cases which do differ 
in one test from the median score in that test by an amount relatively 
larger than the variation which they achieve in the other test. To put 
it in another way: The dots within the diagonal lines represent 
children who are grouped in approximately the same manner by the 
two measures used. The dots outside the diagonal lines show children 
who are given different relative standings by the two measures. The 
fact that relatively few dots are found outside the diagonal lines 
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indicates that the scores in the two measures give approximately the 
same kind of classification.”’ 

The coefficient of correlation (Pearson Product Moment Method) 
for the data shown in Figure 3 is 0.86 +0.0118. 

To supplement the two illustrations just given reference may be 


‘made to published studies. The first critical study of the Delta 2 


Examination published by any person other than the writer was that 
by Holley! who used with the same pupil groups in the public schools of 
Champagne the following tests: (1) Otis Group Intelligence Scale; 
(2) Theisen-Fleming Classification Test, Form A; (3) Whipple’s 
Group Test for Grammar Grades; (4) Pressey Primer Scale; (5) 
Haggerty Intelligence Examination, Delta 2, and (6) Holley Sentence 
Vocabulary Scales. 

The average coefficient of correlation of all the test results and the 
teachers’ rating for scholarship was +0.462. The coefficients for the 
Delta 2 were as follows: 


NE? SU i eo eS Sa ee ae .57 + .06 
gt ctutacamca a ax « «Gite ce .45 + .05 

ink belts tes «oak aber teen .56 + .04 

Be hit aban 3: .. . pe er ee .69 + .03 
ee eee. ee eee Jak O46 

es 2e8 sk? taka 0 MR ea 58 + .05 


The average coefficient is 0.592. None of the other tests given yielded 
such uniformly high coefficients. 

A valuable method by which to determine the value of one intelli- 
gence test is to check it against other intelligence tests of known or 
assumed validity. Such a study was made by Stenquist? who reports 
the results of an extended investigation on the validity of group 
intelligence examination of which the Delta 2 was one. In this study 
a criterion composed of the sum of all the test results was used as a 
measure of each test. The coefficients of correlation for each test 
with this criterion are as follows: 





1 Holley, Charles E.: Mental Tests for School Use. University of Illinois 
Bulletin, Vol. 17, No. 28, 1920. 
2 Stenquist, John L.: Unreliability of Individual Scores in Mental Measure- 


ments. Journal of Educational Research, Vol. IV, p. 347 ff. 
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me 2 OSS Ae r = 0.801 (n = 560) 
sac ao ps a Oa pa kaw & r = 0.788 (n = 518) 
Otis (Advanced)................ r = 0.680 (nm = 551) 
Haggerty, Delta2.............. r = 0.808 (n = 532) 
Visual Vocabulary.............. r = 0.680 (n = 461) 
Kelley-Trabue. . teeeeeeee. T = 0.58 (n = 581) 
Meyers Mental Measure Wasies . r = 0.48 (n = 544) 
Woody-McCall Arithmetic....... r = 0.39 (nm = 298) 


The coefficient for the Delta 2 is as high as that for any test, 
slightly higher than for some and very much higher than for others. 


Stenquist also reports correlations for Delta 2 with other group tests 
as follows: 


National Scale A, 500 cases in Grades IV to VIII... r = .81 + .01 
as bung Bie Gee a 60a ee Rd r= .59 + .02 
National Scale B, 50 cases........................ 7 = .69 + .04 


A similar study is reported by Franzen! who used also the method of 
partial correlation in an attempt to evaluate each of 14 group intelli- 
gence examinations. All of the tests were given to the same group of 
57 first-year high school pupils. Each of the tests was checked against 
a criterion composed of the sum of the scores in all the tests. The 
results were presented in tables showing among other things the 
correlation of each test with the total, the correlations of each test 
with the thirteen others, and the inter-correlations of all tests, with 
reading ability (Thorndike Alpha 2) rendered constant. From these 
data Franzen draws his conclusions as to the value of the several tests. 
Table VIII gives the chief findings of Franzen’s study. The Delta 2 
he includes along with the Otis and the National Tests, all of which 
“give a fairly good account of themselves” in all of the tables. 





1Franzen, Raymond: Unpublished manuscript. 
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Taste VIII 
| Correlation of 
Average of Correlation prs v0 me 
correlations of each test — mo me jor mm 
of each test with total . Th 
with the other score of 13 —s (Thorn- 
13 tests tests o Alghs 3) 
| rendered 
| constant 
as neem ae SES 
he | ee | .75 .92 .85 
OE WAGs bo .74 .93 84 
EE Pe .73 91 | .78 
4. Illinois General......... .72 | .90 | 71 
A Ee 71 | .92 .78 
6. Mentimeter............ .66 | 81 .87 
RETR re .65 | .87 | .80 
ce  £+4,;+|§ ree .62 .80 | .78 
9. Thorndike Reading..... .59 | 81 | 
10. Dearborn 1............. i a 81 
11. Pressey Cross-outs...... | .56 .70 | .61 
12. Dearborn 2............| .55 | .72 | .75 
Ret dsaee 53 | 64 24 
NGS dn sha és 6-0 ve be de 46 | .63 | 77 





Gates! reporting a recent study on the relation of achievement 
in school subjects to the scores in intelligence tests, cites the correlation 
of each of 14 intelligence tests with each of the others, and the corre- 
lation of each test with a composite measure of achievement in school 
subjects. He finds only two tests with a higher mean inter-exami- 
nation-correlation than the Delta 2 (see Table IX). The advantage 
of one of these which requires a third more time in giving is only 0.01 
and of the other which requires more than double the amount of time 
is but 0.05. Only two of the group examinations requiring as small 
an amount of time (National A and National B) show as high corre- 
lations with the composite of achievement. It shows practically 
the same correlation to the composite of achievement (0.52) as does 
the Stanford Revision of the Binet Scale and no group examination 





1 Gates, Arthur I.: The Correlations of Achievement in School Subjects with 
Intelligence Tests and Other Variables. Journal of Educational Psychology, Vol. 
XIII, p. 223. 
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employing so small a measure of time showed quite so high a correlation 
with the Standard Mental Age. 


Important figures collected from several of Gates tables here 
follow: 
































TaBLe IX 
, ' 2 3 4 

Time Correla- Mean r Mean r 
(uaiestes) tion with with with 

Stanford- | achieve- 13 tests 

Binet ment 

re | 80 .58 47 44 
SECO 47 .61 .63 .53 
es ek eg aed | 45 .49 .43 43 
SRE ace Ne aenare | 35 .52 .38 41 
National total.................. 33 51 .63 .50 
Thorndike-McCall..............| 30 57 .48 .46 
a ci os bck | 27 yee .55 .49 
HAGGERTY, DELTA 2........ 21 .48 .62 .48 
A EE | 17 47 .56 .48 
A ae oie | 17 .45 .48 .48 
0 Se ee ee ee | 16 .45 .66 47 
a resis Foe 2. yeas. coi fa. .28 12 21 
A Uap eee hid ~aktoniite ven | 12 42 | .43 37 





Miller! reports the results of correlating the scores in several 
intelligence tests with each other and with the school marks of 55 grade 
IX pupils. The relation which the Delta 2 bears to these several 
measures may be seen in the following table: 


1 Manual of Directions, Miller Mental Ability Test, p. 21. 
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CoRRELATIONS (PEARSON) OF MILLER Test WITH OTHER TESTS AND WITH SCHOOL 
Marxs—55 Grape IX Pupiis, Untversiry of Minnesota Hicu ScHoon 




















x | ® o | 4 
$\é |& |84 
a |gcli cl £1Sale |2e| 8 
2 |EE/2E) = |e8lfgiia) ; 
B |e= (28) 3 [2/25/32] 8 
aicsi6 b 600 st bedav<na ss .784; .747| .76 | .768| .891| .903| .563) .734 
Delta 2.................2+200] «+--| -817] .778) .685| .904) .884| .503] .715 
Terman, Form A............. ...+| ....| 823) .714) .931) .929) .586) .741 
pS ee wees] sees] ose] .712) .842) .914) .564) .716 
tic wc achwedaends Pree ee UU 
Average first three tests....... |. .975| .562 
Average 5 tests above......... | eee] oes] .60] 841 























All correlations are positive. 


On the basis of tests given by Dickson in the Oakland Schools, 
the Delta 2 shows a coefficient of 0.65 + 0.039 with the Army Alpha. 
From the same data the coefficient of correlation with the Stanford 
Revision of the Binet Scale has been found to be 0.84 + 0.018. Simi- 
lar figures have been furnished the writer by Superintendent Bliss, 
Dr. Elizabeth Woods, and others. 

' The writer has calculated coefficients of correlation using the scores 
from the Delta 2 and the Haggerty Reading Examination, Sigma 3, 
Form B and the Miller Mental Ability test on 442 Grade IX pupils. 
The results are as follows: 
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TaBLE X.—CoOEFFICIENTS OF CORRELATION BasED ON 442 Casges or Grape IX 
Pupits In Larce Higu Scuoons, Lyvotvine INTELLIGENCE EXAMINATIONS 
Detta 2, ReEapING EXAMINATION S1@MA 3, AND MILLER MENTAL ABILITY 















































TEST 
| | Delta 2 
Delta 2 Sigma 3 and Miller 
| | Sigma 3 
an ot eee Se Ys 

| i 

lr = 62 92 61 
OS CA a | 

PE = + .021 + .006 + .021 

ir= | .62 | 85 79 
Er | | | 

|PE =| +.021 | | +.009 |  +.012 

ir= | .92 | .85 | 55 
Delta 2 and Sigma 3..... | 

PE = +.006 | +.009 | + .025 

lr= | 61 | .79 55 | 
SPP | | 

/PE=| £.02| +.012| 4.025 | 





The relation of the Delta 2 test to the Van Wagenen history 
scales may be observed in Table XI which gives the coefficients of 
correlation based on the scores of 152 Grade VIII pupils. 
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TaBLE XI.—CoeEFFICIENTS OF CORRELATION. INTELLIGENCE EXAMINATION 
Deuta 2, Reaping ExaMINaTION Sigma 3, AND VAN WAGENEN HIsTory 
Tests. One HuNDRED AND Firty-two Cases 1n Grapve VIII 





























Intelligence Reading a 
examination | examination 
‘ of Delta 2 
Delta 2 Sigma 3 and Sigma 3 
r= 45 .50 .54 
History information......... 
PE = + .043 + .041 + .04 
r= 71 .78 .69 
History thought............. 
PE = + .028 + .024 + .03 
r= || .63 | 63 .79 
Combined history tests....... | 
PE = | + .033 + .035 + .024 
| 














Coefficient: History information and History thought = 0.60 + 0.035. 


The actual significance to-be attached to any or all of the coeffi- 
cients of correlation printed in the foregoing tables cannot be accu- 
rately stated. They are calculated on groups which vary greatly in 
character and are subject to various modifying influences which may 
unduly raise or lower the theoretically correct figures. The constancy, 
however, with which different investigations report significant coeffi- 
cients is fairly conclusive evidence that the Delta 2 Examination has 
high rank among tests of this type. It is hoped that the new table of 
age norms printed herewith will measurably increase the usefulness 
of the test by enabling a more accurate determination of mental 
ages and intelligence quotients. 
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AN ANALYSIS OF THE ERRORS IN MENTAL 
MEASUREMENT 


KARL J. HOLZINGER 
University of Chicago 


In the early development of tests and scales we were content to 
apply a good deal of biometric statistical method in the construc- 
tion of the tests themselves and in their applications to educational 
problems. This biometric calculus, however, was originally developed 
for the purpose of describing quite different types of traits from those 
studied in psychology and education. The description of a group of 
human skulls is a very different matter from the description of certain 
obscure reactions which go on within these same skulls, and for this 
reason some workers in mental measurement have come to question 
the strict applicability of many of the conventional statistical methods 
to their data. 

Spearman! recognized that in measuring mental traits certain 
errors arise which do not need to be considered in describing fixed 
physical objects. These errors in “‘faulty data’’ are chiefly due to the 
fact that we are measuring variable traits by an indirect procedure. 
Both the variability of the traits from moment to moment and the 
indirectness of the determination lead to types of error which must be 
recognized and studied with care. In the last two or three years the 
attention of several American workers in tests has been turned to these 
problems. Monroe, Kelley, Otis and others have set forth certain 
formule which attempt to take into account the degree of reservation 
with which we may regard the accuracy of test results. The purpose of 
this article is to analyze,the more important of these errors, to indicate 
tentative formule for their study, and to point out their importance 
in the interpretation of mental measurements and in scale construction. 
To cover this much ground in a short space will imply sketchiness, 


but it is hoped that if the analysis proves helpful the details may be 
filled in later. | 


Two Kinps or EDUCATIONAL SCALES OR TESTS 


In order to carry out the analysis suggested it is necessary to recog- 
nize two types of scales which are in use at the present time. The first 
kind may be characterized as the quality or product scale. Examples 
may be found in the current composition scales and in the Ayres Hand- 


1 Spearman, C.: The Proof and Measurement of Association between Two 
Things. Amer. Jour. of Psy., Vol. XV, 1904. 
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writing Scale. A sample of the pupil’s work is obtained and this is 
matched by the teacher with a scaled specimen of known merit. The 
score is that of the specimen which the sample most closely resembles. 
Now it is clear that several kinds of error may creep into an evaluation 
of this sort. The scale itself may not be accurately set up, the pupil 
may give an unrepresentative sample of his work for matching, and 
finally the teacher may rate the sample inaccurately. 

The second type of test may be called a performance test. In this 
case the pupil responds directly to the test material which is set before 
him. His score is determined by his direct response to the questions 
or items which he is required to cover. Here, again the material 
may not be well graded, the pupil may not make a representative 
response, and different scorers may not agree in the numerical value 
to be assigned to a given performance. With objective tests this last 
type of error may be practically eliminated. The difference between 
product and performance scales from the point of view of errors lies 
chiefly in the fact that the scoring of the latter may be made much 
more objective. 


Types oF Error IN MENTAL MEASUREMENT 


It is now possible to formulate the kinds of error which need to 
be studied in connection with both types of scales. We shall 
enumerate five with the understanding that the classification is neces- 
sarily crude at the present stage of development in mental 
measurement. 

1. Scale Errors.—(a) In product scales these errors are due to imper- 
fections in the material arising from poor selection and graduation of 
the specimens. In matching a pupil sample with these specimens an 
error will occur by assigning the incorrect scale value of the specimen. 
For a group of pupils rated by the same teacher such errors will tend 
to be constant 7.e., the same for all pupils whose work is matched with 
agivenspecimen. They are difficult to study because true scale values 
are unknown, and are further obscured by the subjective procedure 
in rating pupil samples. 

(b) In performance tests the problem is again obscured by the 
direct response of the pupil to the test material. The selection, 
graduation, and arrangement of the items will affect the response made 
by the pupil. The writer constructed a reasoning test which on analy- 
sis was found to measure speed in handwriting to a very considerable 
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extent. It was also found that a repetition of the test did not produce 
very consistent results. By changing the form of the response required 
and by lengthening the test these two defects were largely remedied, 
but a new factor of fatigue was introduced. The test was improved 
in its validity and reliability, by which are meant the extent to which 
it measured, what it purported to measure, and consistency of response 
on repetition with the same pupils. Scale error with performance 
tests is thus so intimately related to response that it is difficult to deter- 
mine how much of the variation with a repeated test is due to imper- 
fections in the material and how much to actual fluctuations in ability. 
In the present paper we shall not attempt to analyze this problem 
any further, but will classify the gross fluctuations on repetition as 
errors in response with the understanding that the chief contributing 
cause is usually immediate variation in ability. 

2. Scoring Error (y).—In measuring pupils’ abilities with tests, 
numerical values are assigned by the examiner to the samples to be 
matched with scale specimens or to the responses made on performance 
tests. Scoring procedures of both types of tests will lead to errors 
which may be distinguished from those already mentioned. In the 
case of the product scale, scoring error will in general be large on 
account of the subjectivity involved in estimating sample merit. In 
Dr. Theisen’s report! on the Trabue Scale a single composition, C-3, 
was rated all the way from 2.8 to 9.0 by 15 teachers. For perform- 
ance tests, on the other hand, such errors will usually be small. It is 
possible to prepare test material and to formulate scoring directions 
so that competent examiners will score a given performance in the 
same way. 

3. Response error (6), is due to the fact that pupils respond differ- 
ently on successive trials with a test when short time intervals separate 
the trials. These fluctuations may be attributed to effort, emotional 
status, concentration etc. They cannot be ascribed very well to any 
fundamental change in the ability in question. We are measuring 
mental traits which exhibit instantaneous variation and we can never 
be sure at what phase of this variation a given performance occurs. 

The procedure is a good deal like that which would be involved 
in measuring the length of earthworms during a series of expansions 
and contractions. We should not think of comparing the length of 





1 Theisen, W. W.: Improving Teachers’ Estimates of Composition Samples 


with the Aid of the Trabue, Nassau County, Scale. School and Society, Vol. VII, 
February, 1918, pp. 143-50. 
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two worms from single measurements under such varying conditions. 
If we wished to make a useful comparision we would probably make 
several determinations and strike an average for each worm. Now a 
human ability is more difficult to measure that the length of an earth- 
worm because we can never be sure whether a particular performance 
occurs under ‘‘expansion”’ or “‘contraction.”’ The best method we 
have at present is to make at least two determinations for each indi- 
vidual in a group and from the series of differences thus obtained set 
up a measure of the probable divergence of a given score from the 
theoretically true score. The factor of group practice effect may be 
eliminated as will be indicated below. 

4. Sampling Error (€).—If we confine our statistical descriptions and 
inferences to a particular group this error does not occur, but when we 
wish to extend these inferences beyond the results of the particular 
sample, it must be taken into account. The sampling error of a 
statistical constant tells us the amount and probability of the variation 
we may expect if the same constant is worked out from another sample. 
This index of reservation should be carefully distinguished from all 
other types of ‘‘error,”’ yet it is not infrequently supposed to have some- 
thing to do with the arithmetical accuracy of the computations. 
There is probably no conception in statistical method so commonly 
misunderstood. 

5. Sporadic errors are those due to arithmetical blunders in scoring, 
misunderstanding of test directions, time lost by the pupil with a 
broken pencil, ete. Such errors may be eliminated. They do not 
lend themselves to mathematical treatment such as is possible in the 
case of scoring, response, and sampling error. 


FUNCTIONAL RELATIONSHIPS BETWEEN ABILITY AND SCORE 


Certain functional equations may now be set down expressing 
the relationships between ability and score with the types of error 
enumerated. If attention be confined to a particular group and no 
errors considered, the expression may be written, 

(a) Ability = f (score) 

i.€., given a particular score, the ability is uniquely determined. This 
theory upon which much of our early statistical work was based is not 
tenable because errors do exist and may not be neglected. 

In the case of a product scale the relationship becomes 


(b) Ability = g (score, y, 4) 
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if the existence of scoring and response error be admitted. For per- 
formance tests which are objectively scored the equation is 


(c) Ability = h (score, 6) 
Finally, if inferences are extended beyond the particular group 
measured, each of these three relationships will involve sampling 


error, €, so that the most general expression with which we have to 
deal is 


(d) Ability = k (score, y, 6, €) 


FORMULA FOR SCORING ERROR WITH THE PRODUCT SCALE 


If several teachers rate a single sample with a product scale the 
best assumption that can be made regarding these judgments is that 
they are distributed according to the Guassian law. Actual distribu- 
tions of residuals check the assumption fairly weli. Employing the 
usual formula found in any work on least squares we have 


2 
P.E.x, = .6745 gr (1) 


where X obtained rating, v = X — M = variation of such a rating 
from the mean 7.e., residual, and n = number of judges. In the case 
of the 15 judgments of composition C-3, mentioned above, P.E. = 


1.73 which means that it is an even chance that the true judgment will 
vary from the obtained by this amount. 

Response error with product scales is due to the differences in 
samples which pupils submit for rating. With constant scoring 
error a measure of variability in response may be obtained by working 
with the differences in successive samples fora given group. Formula2 
(below) might then be applied. As a matter of fact scoring error will 
not be constant for the two sets of samples to be compared, and will so 
obscure the result that a mathematical formulation is extremely diffi- 
cult. It will not be attempted in this paper. 





FORMUL&Z FOR RESPONSE AND SAMPLING ERROR WITH PERFORMANCE 
TESTS 


Let X, and X,» denote scores on successive trials of a test with a 
given group, 6; and 42 the respective response errors, and X! the true 
score by either trial so that X; = X! + 6, and X_2 = X! + ds. 
Setting d = X_. —X,=6.—6, we may obtain an expression for P.E. 
or P.E.x, (to distinguish it from the error in the mean) in terms of d 
which is the known difference between successive pairs of scores. 
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According to the law of error og =+/o5; +07 or V2 a; if the 
standard errors on both trials are equal and the individual errors uncor- 


related. We may then write 


.6745 
P.L.x, = 2 oa = ATTe4 (2) 


It is evident that this function is independent of group practice effect, 
Mz — M,, since o,4¢ = o, where c is a constant. Formula (2) is 
the same as that obtained by Monroe and others. It is equivalent 
to Merriman’s! formula (39) with the assumption that the scores are 
of equal weight. 

The writer has attempted to check some of the assumptions 
involved in the above proof by means of empirical tests. The assump- 
tion that the error, 5, is normally distributed is checked roughly by 
the fact that distributions of d = X. — X, resemble the Gaussian 
curve rather closely. Data will be presented in a later article. From 
the results of repeated tests in arithmetic and intelligence it was also 
possible to check the assumption r;,;, = 0 by working out raja. Six 
values for arithmetic material are as follows: 0.05, + 0.04, + 0.11, 
+ 0.13, + 0.15, + 0.32 (N = 62, Grade V), the last coefficient being 
significant. Brown and Thomson? find higher correlation with differ- 
ent test material, and on the strength of such evidence question the 
validity of Spearman’s formule for attenuation. Finally the assump- 
tion that the scores are of equal weight implies that they will have 
equal probable errors. The correlation r,; should therefore be zero 
within the ordinary limits. By actual test r,.g = —0.22, —0.26, 
—0.27, —0.29 (N = 62, P.E., = 0.08). It therefore appears that a 
small response error is associated in general with a high score and vice 
versa. This is contrary to the law for accidental errors of observation 
ie.,P.E.a = P.E.. ~/\ength, which implies that the larger errors are 
associated with larger linear magnitudes. On the whole the assump- 
tions involved in the proof of (2) appear to be roughly justified but a 
good many careful tests with different types of material are needed. 


Formula (2) may be written in anotherform. Sinced = X2 — X,,, 


oa? = 62,2 — 27120202, + 62,2 OF Ga = 6,021 — rye) if on, = ox. 
Substituting in (2) gives 
P.E.x, = 0.6745¢,+/1 = T's (3) 


1 Merriman, M.: ‘“‘ Method of Least Squares.”” New York: John Wiley Sons, 
1915. 

? Brown and Thomson: “Essentials of Mental Measurement.’”’ England: 
Cambridge Press, 1921, pp. 158. 
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This formula is often more convenient than (2) because if the reliability 
coefficient is known, the required probable error may be readily 
obtained. The two formule serve as mutual checks. 

The interpretation of the probable error of response is that it is 
an even chance that the next score or the true score of an individual 
will differ from the obtained by the given amount. This interpreta- 
tion must be modified, however, in the light of the above experiment 
on correlation between score and error. If a pupil makes a high score 
on a test the probable divergence of a second score will be less than that 
predicted by formule (2) or (3). 

As an example, P.E.,; for the Terman Group Intelligence Scale, 
Forms A and B, with 135 first-year high school pupils was 6.6 points. 
Using 3P.E. as a criterion for safe prediction this means that a pupil’s 
true score will lie within the range 20 points below to 20 points above 
the obtained value with practice effect eliminated, or that we are 
reasonably sure that his true score is within 20 points of the obtained. 

The probable error in the mean is often more useful than that for 
an individual score. Assuming that the law of error again holds we 
may write 
P.E.x, 0.67450. 1 — ris (4) 
VN VN 
For the Terman Scale P.E.y, = 0.6 points. With groups of this size 
(135) the response error in the mean is often relatively small, but not 
negligible. 

The formula for sampling error in the mean is, 
0.6745¢, 5 

UN (5) 
If we assume sampling and response error both present but uncor- 
related, formule (4) and (5) may be combined so as to give an expres- 
sion useful in testing differences. Since P.E..43 = ~/P.E..2 + P.BE.? 
for uncorrelated errors a and b we have 


0.67450 .»/2 — ris 


P.E Lu, = 





P.E uu, = 





P.B iu. = 


6 
VN (6) 
It will be noted that when a test is perfectly reliable 7.e., rig = 1 


this expression reduces to (5) and that as 7:2. decreases the probable 
error increases. With the Terman material r;.= 0.91 andoc, = 16.8, 


P.E.y, = 1.44 and P.E. Mo43 = 1.47 so that the contribution of the 
error 6 appears to be relatively slight. With a reliability coefficient of 
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0.8 and P.E.y, = a, P.E.w., = 1.la. Thus with ordinary tests a 


difference of about 10 per cent may be expected when response as well 
as sampling error is taken into account in interpreting the significance 
of the mean. 


THE IMPORTANCE OF ERRORS IN INTERPRETING RESULTS 


At the risk of repetition it may be well to point out in some detail 
the importance of errors of the types analyzed in the interpretation 
of educational measurements. Few workers with tests view the score 
of a pupil from a single trial of a test with any great degree of confidence, 
but until recently they have not had adequate means by which the 
degree of reservation could be expressed in numerical terms. It is a 
rather enlightening thing for anyone who struggled through the Army 
test to know that on repetition it is an even chance that he would have 
scored 15 points higher (or lower). Where standardized tests are 
used to measure individual progress during remedial instruction it is 
also important to know whether or not the gain of a particular pupil 
is significant. Formule (1), (2), and (3) help toanswer these questions. 

In the case of experimental work it is necessary to have satisfactory 
means for testing the difference between averages. For example a 
common procedure in evaluating a method of instruction is to equate 
the practice and control groups at the beginning of the experiment by 
the use of tests. At the end of the training period similar tests are 
administered and the difference in average gains by the respective 
groups taken as a measure of the superiority of the instructional 
method. Unfortunately such differences are usually slight and need to 
be interpreted with great care. They may be assignable to fluctuations 
in sampling, to variation in response, or to both and possibly other such 
errors. Formula (6) is therefore a useful device for problems of this 
type. 

An examination of equation (6) indicates that the reliability coeffi- 
cient of the test is required. This may be obtained by repeating the 
test after a short interval either at the beginning or at the end of the 
experiment. As a supposititious example let us assume that the mean 
gain for the practice group is 20 + 2, while that for the control group 
is 14 + 3, the probable errors being obtained by formula (6). The 
difference between the means may then be written 20 — 14 = 6+ 
+/2? + 3? = 6 + 3.6. In sucha case the difference would not be con- 
sidered significant inasmuch as it is less than twice its probable error. 
Such an experiment is therefore inconclusive. 
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RELATION OF ScoRING ERROR TO THE SCALING OF PRopuctT TEstTs 


The neglect of scoring error has led to much unnecessary refinement 
in the scaling of product tests. Scale specimens are frequently 
graduated to the second decimal place in P.E. scale units whereas the 
probable error P.E. Me for several scorers is likely to be greater than a 


scale unit. Assuming that teachers could be so trained as to reduce 
P.E.x, on a composition such as Trabue C-3 cited above, say from 1.73 


to 0.5, it is questionable whether it pays to express the values of the 
scale specimens beyond the nearest scale unit. It is surely confusing 
to express them in units finer than can be discriminated in scoring. 


RELATION OF RESPONSE ERROR TO THE SCALING OF PERFORMANCE 


TESTS 


A common method of scaling performance tests is to convert the 
point score or number of correct responses into scale score expressed 
in units of group variability. The conversion may be accomplished 
for each item individually or by using the total point score as suggested 
by McCall and others. All of the methods are essentially the same 
as that employed by Pearson in his classic study of intelligence. The 
integral of the normal curve is employed as the index variable. The 
advantages of such scaling are that scores are expressed in comparable 
units from a supposedly suitable zero point. Thefundamental assump- 
tion is that degree of difficulty is a measure of any ability, and this is of 
course very questionable. Nevertheless it is probably the best single 
objective indication that we have. 

If such scaling be abandoned in favor of crude point score there will 
be a loss of the difficulty unit and at the same time of the zero point. 
It is worth while to raise the question now as to whether these losses 
are irreparable for the type of measurement which it is possible to 
achieve with performance test. 

In standardizing a set of 40 questions in French grammar the writer 
had occasion to study the desirability of weighting the items in various 
ways. The first procedure was to weight each question on the basis 
of the responses of some 300 pupils. The correlation between simple 
point score and this refined scale score was then worked out and found 
to be over 0.99. A second plan was to scale the total score according 
to the method of McCall.!. Again the correlation between weighted 





1 McCall, W. A.: A Proposed Uniform Method of Scale Construction. Teachers 
College Record, Vol. XXII, January, 1921, pp. 31-51. 
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and unweighted score was over 0.99. The reliability coefficient of the 
test was found to be about 0.90 using the same group. Similar 
results have been found by Charters, Douglass, Monroe and others who 
have decided in some-cases to drop weights on their tests on the basis of 
these findings. The issue is essentially this: If the correlation be- 
tween weighted and unweighted score is considerably higher than the 
reliability coefficient of the test, does it pay to use the refined weighted 
units? As in the case of the product scale such weighting may be an 
elaborate and unnecessary refinement. 

As a further check on the above studies in weighting, different 
types of material varying in length and difficulty need to be examined. 
The Hotz weighted algebra problems were studied by the same method. 
For a series of five or six problems varying considerably in weight, 
the reliability coefficients were found to be approximately 0.7 and the 
correlation between point and scale score about 0.9. With arithmetic 
problems and intelligence components corresponding values of 0.8 
and 0.95 were found. In no case did the reliability coefficient equal 
the correlation between weighted and unweighted scores. This latter 
correlation may be made to run higher than 0.98 by suitable graduation 
and lengthening of the test material. With linear regression this 
means that 

Scale score = a (point score) + b where a and b are constants 7.e., 
scale score is approximately a linear transposition of point score which 
amounts to a magnification and shift of origin. The magnification 
if desired can be accomplished by much simpler means e.g., by multi- 
plying each score by a constant, but there is no gain in accuracy by 
using such units. 

By using the point score the origin is ‘‘no score made”’ instead of 
the theoretical zero point, ‘‘just no ability” or difficulty. This loss of 
the theoretical zero point does not seem to the writer to be a serious 
one. In the first place quite different zero points are obtained by 
different methods of scaling e.g., by individual questions or by total 
score. The method of determination is thus arbitrary. Furthermore 
such a zero point is always a function of the difficulty of the particular 
material used in the test. If more difficult material is used the zero 
point shifts. The interpretation, ‘‘just no ability” is thus a relative 
matter. 

The chief reason for wanting such a zero point is that one may be 
able to say “John has twice as much ability as Henry.” At first 
sight this looks like an interesting and valuable comparison to make, 








4 "a SP ee eee, 
Pa . 
< : 


‘ <. =iteves 
es - 
. 


a 


ee rE 


|| 
| 
| 

i 


- a 
_ = 


aa <, 
8 Gl Ws. 


Sins 


“- 


-Sset 5 
q ce 5 - 


See? RP yee i: ee; nba 
= 


3 Sees 
a ee - - _ + gt = 
= pate i 
: += mieten 
be > 
Ee eR Ae 
- 2 , - 


le ee! : 

eae te 2s 
* St SR RERIES Fe ocr 
ea 











” 


288 The Journal of Educational Psychology 


but practically it is of little value, and since it is always made relative 
to the particular material of the scale may be very misleading. Sup- 
posing, however, that the ability of the two boys has been so expressed, 
what advantage has such a comparison over that obtained by difference 
in score from any reference point? In temperature we measure from 
the arbitrary point zero degrees Farenheit. Clearly 20°F. is not twice 
as hot as 10°F., yet the comparison has just as much meaning for the 
purpose for which such measurements are made as if absolute zero had 
been employed. There seems to be no good reason why the value ‘‘no 
point score’? may not replace the theoretical and more nebulous zero 
points from scaling without serious loss in interpreting results. If 
another reference point is desired for refined comparison with different 
tests, the mean of the respective distributions is clearly the most 
stable and useful. 

The determination of difficulty values is often useful in graduating 
the test material to furnish comparable sets and to arrange the items 
so as to insure smooth progress by the pupil in taking the test. The 
Henman French Tests furnish an illustration. In determining ratings, 
however, point score may be substituted for refined scale score if the 
material is sufficiently long and-well graded. This simpler procedure 
will insure a precision in harmony with the possible accuracy of such 


measurements, and will be as practically useful as if the theoretical 
zero point had been employed. 
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AN INTERPRETATION OF LAY ATTITUDES TOWARD 
INTELLIGENCE TESTS 


DONALD A. LAIRD 


University of Wyoming 


In arecent paper Knight! presents the results of applying intelligence 
tests to a group of teachers who were given the privilege of writing 
their names on the test papers or not, as they might choose. He found 
significant differences in the intelligence scores of the groups thus 
separated. On the whole the group that did not sign the test papers 
scored consistently lower. 

The present communication is concerned with the attitudes of a 
group of 55 students in elementary psychology toward the Thorndike 
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Intelligence Examination for high school graduates. These students 
were freshmen, who had taken this intelligence test as a routine part 
of their admission to the university. They were given a written 





1 Knight, F. B.: The Significance of Unwillingness to be Tested. Journal 
Applied Psychology, 1922, Vol. VI, pp. 211-213. 
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assignment of writing seriously of this test, weighing everything 
carefully and finally concluding as to whether they were “for” or 
“against” the test. This was assigned and completed before they 
knew what their scores were. 

The papers were sorted into two groups, one for and the other 
against this test. Then the composite score made by each student 
was marked on the paper and a distribution arranged by groups 
according to the score, using 5-point intervals. The results are given 
in the following table: 

The opinions against the test were voiced by those who did‘not 
do well on the test, although at the time of writing their opinions 
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they had no means of comparing their scores with the scores made by 
others. Thus, although they were instructed to list all the pros and 
cons in their paper before even trying to decide for themselves what they 
should think about the test, we find cropping out the unconscious 
realization of their mediocrity in the weighed opinions. 

A significant feature appears in these data as graphed in Chart I, 
which is a surface of frequency by the two groups. It will be noted 
that the frequency surface of the “for” group resembles, as closely 
as one would anticipate with 55 cases, a normal distribution curve. 
The other curve, however, is distinctly bi-modal. These few students 
of higher intelligence who are grouped with those of lower intelligence 
in their opinions do not controvert the generalization which has just 
been made. For instance the three high men in the “against” group 
comprise the freshman debating team. The opinions against the 
intelligence test in the students with higher scores have probably 
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been reached in the light of reason, more nearly pure and simple, than 
is the case with the lower scoring members of the “‘against”’ group. 

Another approach to this problem was obtained in recording 
the degree of cooperation of each of these same students in a test 
designed to see if professors could tell the intelligent students by the 
pictures of the latter. The students were requested to hand in to 
their instructor a recent photograph or snap shot. One week later 
a reminder was given of these promised pictures. A week from the 
date of this reminder another request was given. The returns, again 
grouped by score interval, were as follows: 


TaBLeE II 





Pictures in: 

Not 
| yet in 
First week | Second week Third week 


Score | 
interval | 











100-104 | 
95-99 | 1 | 1 
90-94 - 1 
85- 89 
80- 84 
75- 79 
70- 74 
65- 69 
60- 64 
55- 59 
50- 54 
45-49 | 
40-44 | it ~ f 
35- 39 * M 1 
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These results confirm those reported in the first part of this note 
and the statements of Knight. 

This is but another example to the many already advanced 
regarding the dominance of reasoning by personal motives rather 
than logical principles. Opposition to intelligence testing may arise 
from well grounded arguments, or it may arise, as the present com- 
munication shows, from feeling that the score may be low or the test 
embarrassing. 
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COMPARISON OF AMERICAN AND FOREIGN CHIL- 
DREN ON INTELLIGENCE TESTS! 


RUDOLF PINTNER 


Teachers College, Columbia University 


There seems to be much difference of opinion at the present time 
among psychologists interested in intelligence tests as to the validity 
of the conventional verbal group test as a measure of intelligence for 
foreign children. Some are inclined to believe that such tests give an 
accurate rating of our foreign children and that their reputed language 
handicap is itself an index of lack of intelligence. In this connec- 
tion, Young? has shown that correlations of intelligence ratings with 
teachers’ estimates and school work generally run higher for a verbal 
than for a non-verbal test. But we should not forget that a teacher’s 
estimate of a child’s intelligence will unquestionably be influenced by 
the child’s ability to use the English language, and, of course, all the 
child’s school work is conditioned by his ability to understand and 
make use of English. It may be true, therefore, that for purposes of 
classification a verbal test is as good as a non-verbal, because ability 
to get on in school requires the use of the English language. If, how- 
ever, the school wishes to select the brighter foreign children for 
special work in English, a verbal test may not be so good. 

The question of prognosis value for school purposes must not be 
confused with the question of the absolute intelligence of different 
racial groups. It seems to the writer that non-verbal tests alone are 
adequate for this purpose. It is inconceivable that children living 
in an English-speaking environment, hearing, speaking, reading 
nothing but English should not have a distinct advantage in tests 
requiring the finding of opposites of words, the hunting for an appro- 
priate analogy, the filling in of an uncompleted sentence, and the like, as 
compared with children who hear a foreign language at home and in 
many cases are required to communicate in a foreign language to some 
people in their environment. Such contrasting groups are very far 
from having had equal previous practice on the elements which go to 
make up the usual verbal test. 


1 The writer wishes here to acknowledge the help of Mrs. A. H. Talbot in gath- 
ering the data necessary for this article, and in making the necessary tabulations. 

2? Young, K.: ‘‘Mental Differences in Certain Immigrant Groups.” Univ. of 
Oregon, Publication No. 11, July, 1912. 


292 








—n1 oe tee G & te ol 2S 








Comparison of American and Foriegn Children 293 


In this connection the following data gathered in a New York City 
school may be of interest.!. The children in the third and fourth grades 
were all given the National Intelligence Test, Scale A, Form 1, and the 
Pintner Non-language Test. The distribution curves for both tests 
show much the same type of distribution with no zero scores and no 
perfect scores. All scores were then converted into mental ages and 
Table I shows the percentage distribution of mental ages for the various 
nationality groups and for all groups combined. The American 
children were largely of Irish descent. Under German were included 
all children whose parents were born in Germany or Austria (the 
former Austrian Empire) and, therefore, in this group there are a 
number of Slavic nationality, judging by the family name. The 
Polish and so-called German groups are small and of little consequence. 

The median mental age for the total group shows a higher mental 
age on the Non-language as compared with the National, 9 years, 4 
months against 8 years, 9 months. This may mean that the children 
on whom the norms for the National were based were in general slightly 
superior to those who were used in the standardization of the Non- 
language or else that the children in this particular school were in gen- 
eral somewhat slightly inferior in such verbal ability as is tested by the 
National Intelligence Test. This superiority on the Non-language 
Test is true not only of the foreign groups, as one might expect, but also 
of the American group where we have a median mental age of 9 years, 
4 months on the Non-language and 9 years, 0 months on the National. 
Comparing the separate Nationality groups we note that the medians 
on the Non-language for the Polish and Germans are above the median 
for the Americans whereas the median for the Italiansis below. For all 
the foreign children combined the median is the same as for the 
American. On the National Test all the medians fall below the Ameri- 
can median. A similar relationship holds in both tests for the upper 
and lower quartile points. 

Our best comparison of the tests can be obtained by a study of the 
percentage of any foreign group reaching or exceeding the median of 
the American group. These percentages are as follows: 


The writer wishes here to thank Miss Martha Wilson, Principal of Public 
School 127, Manhattan, for her assistance in obtaining the necessary information 
as to the nationality of the children, and also for her kindness and cooperation 
while the tests were being given in her school. 
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Percentage of foreign group reaching or exceeding Median Mental 
Age of American group on tests: 


NATIONAL 
NON-LANGUAGE INTELLIGENCE 
I aes ea a 43 36 
ids ah as 5's wiikey ect ste sas 61 41 
SING do ddl te. Sen Fstl odes 62 36 
4 re re sre 50 37 


Here we see that there is no difference between the foreign group 
as a whole and the American group on the Non-language Test. The 
curves of distribution are practically identical, as can also be seen from 
Table I, where the medians and Q’s are all the same. On the National 
Test, however, the difference between the American and foreign group 
is quite marked. Only 37 per cent of the latter reach the median of 
the American group. The two groups therefore, are markedly differ- 
entiated on this test, although the overlapping is still considerable. 
When we examine the different foreign groups, we see that the Italian 
group, which is large enough to afford a fair comparison with the 
American group, falls below the American on both the Non-language 
and the National Tests. The difference between the two groups on 
the Non-language Test is not very great but there is still a difference. 
This seems to be in agreement with the majority of studies of this 
national group. All reports indicate the inferiority of the Italians on 
all kinds of intelligence tests, but the writer is inclined to believe that 
the discrepancy between the groups as usually shown by means of 
verbal tests over-emphasizes greatly the intelligence difference between 
Italians and Arbéricans. The present data, although slight, support 
the previous resulfs reported by the writer! and they would seem to him 
to indicate caution in drawing conclusions as to the intelligence of 
foreign children when tested solely by means of tests which presuppose 
the understanding or reading of the English language. 

1Pintner, R., and Keller, R. Intelligence Tests of Foreign Children. J. of 
Ed. Psych., Vopr vo. 4, April, 1922, pp. 214-220. 
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MENTAL AGE EQUIVALENTS FOR A GROUP OF NON- 
READING TESTS OF THE HERRING REVISION 
OF THE BINET-SIMON TESTS 
CHARLES F. WILNER 


Research Assistant, Bureau of Educational Research, 
Bloomsburg (Pennsylvania) State Normal School 


Despite the fact that the Herring Revision was intended primarily 
for the classification of normal children, it has come into some use in 
the psychological clinic for the purpose of rating defectives. Dr. 
Grace H. Kent, psychologist of the Worcester State Hospital, states 
that the usefulness of the Herring Revision for this purpose is lessened 
because of the number of tests which require reading ability on the 
part of the subject, since children who can read with fair fluency are 
rarely sent to the clinic. Miss Kent finds, however, that 16 of the 
non-reading tests of the Herring Revision are well suited to the work 
of the psychological clinic, and it is at her suggestion that the attempt 
has been made to obtain mental age equivalents of scores in these tests 
alone. The 16 tests selected by her are Nos. 1, 5, 7, 8, 9, 12, 14, 18, 19, 
24, 25, 26, 31, 32, 33, 34. , 

There were available for the derivation of mental age equivalents 
for the Kent Group, records of 270 persons who had taken both the 
Stanford and the Herring Revisions. Of these, 154 were those used 
in the original standardization of this Revision. This group included 
children from the Garden City public schools, Scarboro School, and 
Letchworth Village Institution for the Feeble-minded. The examiners 
were Miss Grace Taylor, Raymond H. Franzen and John P. Herring. 
The other 116 examinees were 12-year-old children from the public 
schools of Bloomsburg, Pennsylvania. All 12-year-olds in this public 
school system, except a few who were absent when the examinations 
were given, were included in this group. The examiner was Mrs. 
Marjorie H. Wilner. 

The norms were derived by equating the decile points of the dis- 
tribution of scores in the Kent Group with the corresponding decile 
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points of the distribution of Stanford mental ages. 
were as follows: 


These decile points 


DECcILE STANFORD MA’s Herrine (Kent) Scores 

1 82 39 

2 87 42.71 

3 97 51.25 

4 118.5 59.8 

5 131.5 63.58 

6 138.33 65.82 

7 147 .67 68.33 

8 155.5 70.89 

9 173 73.43 


The resulting relation line was then smoothed by taking as the true 
value of each step, the arithmetic mean of its value and the values of 
the 5 successive steps each side of it. Values for points below the first 
decile and above the ninth decile were found by rectilinear extrapola- 
tion beyond these points. This gave the series of mental age 
equivalents in the following table: 


MeEnTAL AGE EQUIVALENTS 
Kent Group, Herring Revision of Binet-Simon Tests 






































| | | 
Score MA | Score MA || Score| MA Score; MA | Score; MA 

| | | 

| | 
1 33 || 17 | 53 || 33 | 74 || 49 95 || 65 | 136 
2 34 is | 55 || 34 | 76 || 50 97 || 66 | 140 
3 35 19 | 56 || 35 | 77 || 51 98 || 67 | 143 
4 37 || 20 | 57 || 36 | 78 || 52 | 100 || 68 | 147 
5 38 || 21 | 58 || 37 | 79 || 53 | 102 || 69 | 150 
6 39 || 22 | 60 || 38 | 81 || 54 | 104 |] 70 | 155 
7 40 || 23 | 61 39 | 82 || 55 | 107 || 71 | 161 
8 42 24 | 62 40 | 83 || 56 | 109 72 | 167 
9 43 || 25 | 64 || 41 | 85 || 57 | 112 || 73 | 172 
10 44 || 26 | 65 || 42 | 86 || 58 | 114 || 74 | 178 
11 45 || 27 | 66 || 43 | 87 || 59 | 117 || 75 | 184 
12 47 || 28 | 68 || 44 | 89 || 60 | 120 || 76 | 190 
13 48 29 | 69 45 | 90 || 61 123 77 196 
14 49 || 30 | 70 || 46 | 91 | 62 | 126 || 78 | 202 
15 51 31 | 72 || 47 | 92 | 63 | 130 || 79 | 208 
16 52 || 32 | 73 || 48 | 94 64 | 133 || 80 | 214 
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A comparison of the mental ages derived from the Kent group 
alone, with the Stanford mental ages showed a coefficient of correla- 
tion of 0.9527 (Pearson product moment; data grouped in class inter- 
vals of 5). The mean of the Herring-Kent mental ages was 127.05 
months, of the Stanford 127.15. The SD of the Herring-Kent MA’s 
was 35.35 months, of the Stanford 35.30 months. The coefficient of 
correlation between the Kent Group and Group E, of the Herring was 
0.9570. The Mean MA of Group E was 124.95; the SD 36.05. (In all 
cases above, n = 270.) 

From these data it is concluded that: 

1. Certain non-reading tests of the Herring Revision may be used 
alone for the purpose of estimating a mental age. 

2. The mental ages derived from these tests will have the same 
meaning as, and be comparable with mental ages derived from the 
complete Herring Revision or from the Stanford Revision. 
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COMMUNICATIONS AND DISCUSSIONS 


To the Editors of the Journal of Educational Psychology: 


I have just read with great interest the article by Rena Stebbins 
and L. A. Pechstein in the October number of your journal. I think the 
main theses and conclusions of the authors are very true and worthy 
of emphasis. However I deprecate for two reasons the appearance 
at this time in your journal of the table on page 388. My most seri- 
ous objection is that this table is based upon provisional norms origi- 
nally furnished by the National Research Council when the National 
Intelligence Tests were first published, but which were superseded in 
the National Intelligence Tests Manual, 1921 Revision. I believe 
that sufficient harm has already been done by persons using the provi- 
sional norms given in the Manual for 1920. The later Manual shows 
that the old norms not only were too high for all ages, but were particu- 
larly high for the younger children. The use of such inexact norms 
results both in the lowering of the mental ages found for all children, 
and in especially penalizing the younger ones. Deductions based 
upon such results lead to false conclusions. IQ’s are too low, AQ’s are 
too high, and teachers of older children suffer by comparison with those 
of younger children. These facts will explain some of the findings in 
the study by Miss Stebbins and Dr. Pechstein, such as those given in 
Table II, page 394, in which the average IQ’s for 8714 per cent of 
the 16 groups studied are below 100; in which the AQ’s are all 100 or 
over; and in which the lower grades studied have on the average 
considerably higher AQ’s than the higher grades. 

Another objection I have to the table on page 388 is that it inter- 
prets the age of eight years as given in the National Intelligence Tests 
Manual as being equivalent to eight years, no months, instead of eight 
years, six months, and so on for each age following. Although the 
1920 Manual does not definitely state what its meaning is in this 
regard, common usage among research workers would lead us to the 
latter interpretation, and the 1921 Manual does definitely state that 
this is the correct one. If the findings of Miss Stebbins and Dr. 
Pechstein were revaluated with this latter error of interpretation of 
scores removed, another lowering of their AQ’s would occur. 

Their article is of value in setting forth method. However it is 
illustrative also of the great care which must be taken in the choice and 
interpretation of norms when this method is pursued. I, myself, 
fell into error, through the use of these same norms, in a study I made 

300 








of 
M: 
th 


su 





we 


Communications and Discussions 301 


of 1Q’s and AQ’s in January, 1921, before the publication of the revised 
Manual. I should like to save others from similarly misinterpreting 
their data, through the use of provisional norms, which have since been 


superseded. 
Yours truly, 


KATHARINE MURDOCK. 
Halekulenu, Honolulu, T. H. 


* . 
ee ee ee ee eae 


2 Sees SERS, 


. > 





~ ge 





Ie OO POLAT 


ee 


NOTE ON THE USE OF SPEARMAN’S PROPHECY 
FORMULA FOR RELIABILITY 


KARL J. HOLZINGER 


University of Chicago 

One of the most important laws which has come to be recognized in 
the preparation of test material is that a long test is in general more 
reliable than a short one. Reliability may be here defined as the con- 
sistency with which a test measures what it purports to measure, the 
consistency being indicated by the correlation between two applica- 
tions of the same test or of equivalent forms. Spearman! and later 
Brown have expressed the degree of reliability to be expected by length- 
ening tests in a formula which may be written 7 

tun = Nrax 
1+ (N — 1)rxx 

where ryy is the reliability coefficient on pooling N tests of equal 
length and reliability, and ryx the reliability coefficient of the indi- 
vidual tests (or average of several). 

When rxx has been determined, the above formula is a function 
of ryy and N so that it appears a simple matter to find N for any 
required reliability ryy e.g., suppose rxx = 0.5 and we wish the final 


lengthened test to have a reliability of 0.9. Theequation then becomes 


0.5N Nii 
0.9 = i+0.5(N — 1)’ or N = 9, so that it will be necessary to 








increase the test to nine times its original length to secure the desired 
reliability of 0.9. It is further evident that when rxx has any value 
except zero that ryy approaches +1 as a limit for 
lim 'nn = lim a = + 1 

lini Neo try 
N N 
This would lead one to expect that by continual lengthening we could 
approach perfect reliability as closely as we please. Experience with 
test and children, however, shows that this is absurd. The law over- 
states the reliability to be expected, and it is important to know how 
much, where in the series, and why such overprediction occurs with 
given types of material. 

In the present experiment the reliability coefficients of the ten 
components of the Terman Scale were determined from forms A and 
B on a group of 135 pupils. The results are given in Table I. The 





1 Spearman, C.: Correlation of Sums and Differences. Brit. Jr. Psy., Vol. V, 
pp. 491-426. 
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TasBLe I.—RELIABILITY COEFFICIENTS FOR THE TERMAN SCALE BY COMPONENTS 
AND ToTaL ScoRE 
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Figure 1.- Theoretical and Experimental Reliability 
Trends based on the ten Terman Conmonents. 
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individual components are not equivalent as to material and length, 
but are sufficiently so for present purposes. The mean reliability 
coefficient is 0.679 and the correlation by pooling, 0.916. Furthermore 
this last value is greater than for any individual component so that 
there is clear evidence of increased reliability by lengthening of the 
material. 

It is now possible to answer the question as to how much the pre- 
diction formula overstates the expected resultfrom pooling. Substitut- 
ing rxxy = 0.68 in this equation gives 1r(40)(10) = 0.96, whereas the 
obtained value is less than 0.92. The overprediction then amounts to 
about 0.04, a very considerable difference with such high correlation. 

In order to determine where the overstatement occurs it is neces- 
sary to apply Spearman’s formula to successively pooled components. 
Thus the first component of form A is correlated with the first in form 
B, then tests 1 and 2 of A with 1 and 2 of B, and so on until all ten of 
one form have been correlated with the tenoftheotherform. Theoret- 
ical and experimental values may then be compared as various num- 
bers of tests are pooled. As a check the components were also 
amalgamated in the reverse order. Table II and the accompanying 
diagram show the results. 


TaBLE II.—THEORETICAL AND EXPERIMENTAL RELIABILITY COEFFICIENTS 
OBTAINED FROM SPEARMAN’S FORMULA AND BY SuccESsSsIVE CuUMULATION 
oF THE TEN TERMAN COMPONENTS 





| | , 
Order of cumulation 
| 








Number of tests Theoretical — 
cumulated | value 
| 1 to 10 | 10 to 1 
1 | .68 .64 .70 
2 .82 81 .79 
3 | .87 .87 .83 
4 | .90 91 .87 
5 .92 .90 | 84 
6 .93 .88 | .86 
7 .94 .89 | .87 
8 .94 .87 | .87 
9 .95 91 | .90 
10 | .96 .92 | .92 








The table and diagram indicate that for the first order of cumula- 
tion the formula gives a good prediction up to four components, but 
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that thereafter considerable overstatement occurs. This might 
be accounted for in part by the fact that the first five individual 
reliability coefficients as seen in Table I are higher than the last five, 
so that the amalgamation of the latter tests might not be expected to 
increase the reliability any further. The reverse cumulation, however, 
shows that rapid increase in reliability for the last four tests is not due 
primarily to high individual coefficients. Furthermore the addition 
of such components as 4 with a reliability of 0.9 does not increase the 
trend appreciably when the amalgamation is made late in the series 
e.g., in cumulating from 1 to 10 the addition of test 4 raises the trend 
0.04, but in cumulating from 10 to 1 it raises it only 0.01. 

The general result then appears to be that reliability increases very 
rapidly with the first four or five tests pooled, but increases thereafter 
more slowly than the prediction formula would lead us to expect. 
Moreover the trend is determined chiefly by the number of tests 
cumulated and is not affected appreciably by highly reliable 
components pooled late in the series. 

In order to study the reasons for such over-prediction it is neces- 
sary to examine and test the assumptions underlying the proof of the 
formula. This will not be attempted in the present paper, but it may 
be sufficient to note here that Brown omits this equation from the 
recent edition of his text on mental measurements. 
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REPORTED BY CECILE COLLOTON 
Department of Educational Psychology, The Lincoln School of Teachers College 
_ INTELLIGENCE TESTS 
Mental Tests as an Aid in the Analysis of Mental Constitution. Harry J. 


Baker. Journal of Applied Psychology, 1922, December, 349-377. Condensa- 


tion of a Ph.D. dissertation. Twenty-six tests of general intelligence and specific 
abilities were administered to 25 high school students and 25 college students. 
Eight individual cases are reported in detail and 31 general conclusions are made. 
Bibliography gives 36 references. 

A Comparison of Three Tests of “General Intelligence.”? Morris 8S. Viteles. 
The Journal of Applied Psychology, 1922, December, 391-402. A study of the 
performance of 59 students of the Wharton School of Finance and Commerce 
of the University of Pennsylvania on the Otis General Intelligence Test, Army 
Alpha, and the Morgan Mental Test. Great variability in the different tests and 
lack of correlation between test results and school grades is noted. 

Freshmen Grades and Mental Tests. W.G. Binnewies. Educational Adminis- 
tration and Supervision, 1923, March, 161-162. A study of freshman college 
grades and group intelligence tests shows a correlation of about 0.4. 

An Initial Inventory of the Mental Capacities of Primary Children. H. E. 
Vander Zalm. Education, 1923, March, 440-445. A general discussion of the 
need for classification by mental tests in the primary grades. Special mention is 
made of the Detroit First Grade Intelligence Test. 

Group Intelligence Examinations for Primary Pupils. O. J. Johnson. The 
Journal of Applied Psychology, 1922, December, 403-416. Part I lists all existing 
primary examinations and compares them in detail as to directions, methods of 
scoring, kinds of tests, etc. Part II described fully the Non-verbal 2 Intelligence 
Examination for Primary Pupils. 

Measures of General Intelligence as Indices of Success in Trade Learning. Carl 
M. Cowdery. Journal of Applied Psychology, 1922, December, 311-330. Reports 
a study of the boys at the Whittier State School, Whittier, California, who are 
engaged in the learning and performing of trade work. Individual intelligence 
tests and a three years accumulation of ratings on 22 different trade groups make 
up the data of the study. 

The Relation of Intelligence to Age in Negro Children. Ada Hart Arlitt. The 
Journal of Applied Psychology, 1922, December, 378-384. One hundred and 
eighty negro children of New Orleans and 63 of Philadelphia tested by the Stan- 
ford-Binet show that at ages five and six, negroes are superior to whites of the 
same social status. Beyond'six, negroes become increasingly inferior with age. 

The Influence of Certain Exercises in Silent Reading on Scores in the Otis Group 
Intelligence Test. Wendell White. Educational Administration and Supervision, 
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1923, March, 179-182. Significant increases in test scores are obtained after 
drill on special reading exercises designed solely to develop speed in reading. 

Improvement in Teachers’ Estimates of Intelligence. W. D. Buchanan. The 
Elementary School Journal, 1923, March, 542-546. Training teachers to ignore 

school achievement and character traits in estimating intelligence brings about a 
high correlation between teacher’s ratings and test scores. 

A Criterion of the Quality of Teaching. Dudley W. Willard, and Curtis T. 
Williams. Educational Administration and Supervision, 1923, March, 147-159. 
A comparison of teachers’ marks and the test scores of 236 eighth grade and high 
school pupils of Kent, Washington, on the Terman Group Test of Mental Ability. 
Interviews with teachers concerning basis of grading are reported in detail. 

Intelligence Levels among State Normal School Graduates. Frederick L. Whit- 
ney. Journal of Educational Research, 1923, March, 229-235. Studies of the 
intelligence of normal school students and graduates based on the Army Alpha 
show favorable comparisons with other college students and professional and 
occupational groups. 

EDUCATIONAL TESTS 

The Development and Comparative Values of Composition Scales. Earl Hudel- 
son. The English Journal, 1923, March, 163-168. Lists and describes the 
various composition scales devised from 1903 to the present time. 

A Comparative Study of the Vocabulary Content of Certain Standard Reading 
Tests. H.L. Ballinger. The Elementary School Journal, 1923, March, 522-534. 
A comparison of the words in 14 well known reading tests, the Thorndike Word 
List, the Horn Word List, and the vocabulary content of 30 first, second and 
third grade readers. Eleven words are common to the 14 tests. Of the 2039 
words in the 14 tests, 1106 appear only once in either the Thorndike or the Horn 
list. 

On Improving Algebra Tests. David Eugene Smith. Teachers College Recor:i, 
1923, March, 87-94. Examples from various algebra tests quoted and criticised. 

An Experiment to Determine the Effectiveness of Practice Tests in Teaching 
Beginning Reading. Nila Banton Smith. Journal of Educational Research, 

1923, March, 213-228. A description of a new method for teaching beginning 
reading used in Detroit with great success. 

Some Limitations of Educational Tests. V. A.C. Henmon. Journal of Edu- 
cational Research, 1923, March, 185-198. Comparative studies of different 
tests in American history, algebra, and reading show the unreliability of the tests 
as a measure of individual achievement. 

Spelling Age Computed from the Score on Fifty Per Cent Lists. Walter E. 
Morgan. Journal of Educational Research, 1923, March, 236-243. Describes 
a technique for computing spelling age in Grades II to VIII using the Bucking- 
ham-Ayres Spelling Scale. 


MENTAL AND EDUCATIONAL MEASUREMENTS 


The Educational Significance of Mental Tests. B. H. Bode. Journal of Edu- 
cational Research, 1923, February, 91-99. Democratic education must build 
upon the various interests and abilities disclosed by mental tests. A super- 
structure of common faith and common knowledge. 

A Study of the Use of the Stanford Revision of the Binet-Simon Test as a Guide 
to Selection of High School Courses. Sara E. Weisman. Journal of Educational 
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Research, 1923, February, 137-144. Reports the achievement of 30 pupils in 
courses selected for them on the basis of the abilities disclosed by the Binet test. 
Thirteen case studies are given. 

Training Teachers for Mental Testing, in Oakland, California. Virgil E. Dick- 
son, and Elise H. Martens. Journal of Educational Research, 1923, February, 
100-108. Discusses the Oakland program for training teachers for mental test- 
ing. Describes the methods of training and gives detailed data on the results. 

Rating Students on the Basis of Native Capacity and Accomplishment. Ira 
A. Flinner. Educational Administration and Supervision, 1923, February, 
87-98. The use of group tests, teachers’ estimates, and individual tests in classi- 
fying boys in a college preparatory school and in comparing native capacity and 
accomplishment. 

The Education of Mental Defectives in the Public Schools of Seattle. Harlan C. 
Hines. School and Society, 1923, February 24, 216-221. How children of IQ’s 
from 55 to 80 are trained along industrial lines. Details of classification, class 
work, and follow-up work are given illustrated by case studies. 

Teaching and Following-up Supernormal Children in a Small Public School. 
Julia F. Keaney. Journal Educational Research, 1923, February, 145-148. 
Tells how the curriculum was enriched for a group of 27 boys of high IQ in a New 
York public school. 

Is Scientific Vocational Guidance Possible? John M. Brewer. School and 
Society, 1923, March 10, 262-266. Discusses what use has been made of scientific 
method in the field of vocational guidance and what remains to be done. 

A Few Suggestions for Informal Testing in Geography. Edith P. Parker. The 
Elementary School Journal, 1923, February, 444-447. Testing children’s knowl- 
edge of geographic principles by new pictures, maps and reading references. 

The Validity of Arithmetical-reasoning Tests. R. V. Hunkins and F. S. Breed. 
The Elementary School Journal, 1923, February, 453-466. Reports a study of 
seven arithmetic tests in general use. Data secured from 127 children in Grades 
V to VIII, Hot Springs, So. Dakota. Conclusions show Stone Reasoning Test 
to be most valid. Birkingham’s Scale for Problems in Arithmetic too difficult. 
Monroe and Stone most useful for diagnosis of individual difficulties. 


MISCELLANEOUS 


Scientific Tests in Education and Their Use. George C. Kyte. Educational 
Administration and Supervision, 1923, March, 163-172. How the many prob- 
lems of promotion, classification, and diagnosis can be solved by the classroom 
teacher through the use of mental and educational tests. 

Individual Injustice and Guessing in the True-false Examination. J. Crosby 
Chapman. Journal of Applied Psychology, 1922, December, 342-348. A cau- 
tion against too much dependence on the operation of chance in true-false examina- 


tions. Four tables present interesting data on scores in hypothetical examinations 
ranging from 30 to 90 items. 


When Children Read for Fun. Jenny Lind Green. School and Society, 1923, 
April 7, 390-392. Reports an experiment to determine how reading for fun can 
be affected by direct training in choice of material. 

The Growth of Children as Influenced by Environmental and Hereditary Condi- 
tions. Franz Boas. School and Society, 1923, March 17, 305-308. Growth 
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curves are influenced by different social environments but the effect of the heredi- 
tary growth curve is greater. 

Mental Fatigue of Mixed and Full Blood Indians. Thomas R. Garth. Journal 
of Applied Psychology, 1922, December, 331-341. An experiment conducted 
with 106 full blood Indians and 80 of mixed blood shows that the full bloods are 
more willing to put forth effort and resist fatigue more successfully. 

Education for Democracy. Alma Paschall. Educational Review, 1923, 
April, 225-227. Discusses needed changes in the public school system to pro- 
vide for individual abilities and higher ethical ideals. 

A Controlled Experiment to Determine the Extent to Which Latin can Function 
in the Spelling of English Words. Warren W. Coxe. Journal of Educational 
Research, 1923, March, 244-247. Describes one of the investigations being 
carried on by the Advisory Committee of the American Classical League. 

The Construction and Interpretation of Correlation Tables. E. L. Thorndike. 
Journal of Educational Research, 1923, March, 199-212. Explanation and 
illustration of a method of making correlation tables from given hypotheses about 
the causes producing correlation. 

Home Conditions of Study and Pupil—Attitude toward School Work A. 
Sampling. F.T. Clayton. School and Society, 1923, February 24, 221-224. Re- 
sults of a questionnaire answered by 645 high school pupils of Concord, 
N. H. Twelve tables present detailed data. No definite tendencies are revealed 
by the study. Need for similar studies in other cities. 

The Meaning of Behavioristic Psychology for Education. J. Herbert Black- 
hurst. Educational Review, 1923, March, 148-150. Stresses the importance of 
building up desirable reaction patterns early in the life of the child. 

Problems of College Admission. Alexander C. Roberts. School and Society, 
1923, March 3, 246-252. Will the so-called two-thirds rule be an adequate 
and just scheme of admission to college? A study of the high school marks and 
university records of 1129 individuals answers this question and raises others. 

The Progress of Kindergarten Pupils in the Elementary Grades. W. J. Peters. 
Journal of Educational Research, 1923, February, 117-126. Reports a study of 
the school progress of 374 fifth-grade children exactly half of whom had attended 
kindergarten before entering first grade. Kindergarten expedites school life. 

The Reading Vocabularies of Third-grade Children. C. A. Gregory. Journal 
of Educational Research, 1923, February, 127-131. Reports the results of a study 
of the minimum requirements of the state course of study of Oregon to deter- 
mine the minimum reading vocabularies of third-grade children. 5000-6000 
words a conservative estimate. 

A Basic List of Phonics for Grades I and II. Mabel Vogel, Emma Jaycox, and 
Carleton W. Washburne. The Elementary School Journal, 1923, February, 
436-443. A study of the frequency of occurrence of the phonograms in certain 
vocabulary counts and readers with reference to the determination of a minimal 
content of phonics for first and second grade. 

Phonics and No Phonics. Lillian Beatrice Currier. Elementary School 
Journal, 1923, February, 448-452. A report of a 5-year experiment eventuating 
in 6 significant conclusions or recommendations. 

A Textbook Score Card. E. M. Otis. Journal of Educational Research, 1923, 
February, 132-136. Describes a score card designed for judging the value of 
informational text books. A list of ‘“‘standards’’ define each item on the card. 
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NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


Y Se EDUCATION a 


CONDUCTED BY LAURA ZIRBES! 


1. An Introduction to Psychology.—To the student of psychology 
who is not afraid of a fairly difficult book, this new volume* by 
McDougall can be recommended. It is designed to introduce the 
student to his science, but it is doubtful whether any teacher would 
care to begin with a book so polemical in nature, a book that presents 
so much of the theoretical background of psychology and which omits 
a great deal of the content which, rightly or wrongly, is at present sup- 
posed to constitute a course in psychology. To the more advanced 
student, rather than to the beginner, the book will prove valuable. It 
presents a well-reasoned account of a “purposive” psychology and 
does not hesitate to challenge the structural and behavioristic psy- 
chologies in favor at the present time. 

The book is excellently written. It does not begin with a long 
account of the nervous system, for which one reviewer at least is 
thankful. The approach of the author is from the study of the be- 
havior of the lower animals up to a study of human behavior. 
Instinct occupies, of course, a dominant role. Habit receives some 
attention. Although there is no application of any of these topics to 
educational theory or practice, the serious student of educational 
psychology will read with interest and profit this well-knit presentation 
of a system of psychology. R. P. 














2. The Measurement of Teachers.—To assert that age, experience, 
quality of handwriting, intelligence as measured by tests and normal 
school scholarship are singly of importance in predicting the degree of 
success of grade school teachers is to argue from opinion rather than 
knowledge. Dr. Knight* in a laborious study shows that none of 
these correlate above 0.15 with teaching ability of 153 teachers. 


1 All unsigned reviews are prepared by Laura Zirbes. 

2 McDougall, William: ‘Outline of Psychology.”” New York: Scribners, 
1923, pp. 456. Price $2.50. 

’ Knight, F. B.: Qualities Related to Success in Teaching, T.C. Contributions 
to Education, No. 120, New York, 1922, pp. 67. 
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Amount of study while in service and ability to pass a professional 
teaching test correlate with teaching success in the neighborhood of 
0.33 and 0.60 respectively. By partial correlation analysis it is shown 
that the latter is the sine qua non of teaching success. This the author 
believes is the direct result of the teacher’s interest in and application 
to her job, and may be largely independent of amount of experience, 
age, etc. Salary, when based on merit rather than experience, also 
correlates in the neighborhood of 0.4 with teacher merit; the author 
does not say which is the cart and which the horse. 

Of immense practical value is the conclusion that all teacher rating 
schemes, based on analyzed traits rather than “general merit’ are 
subject to the “‘halo”’ effect—the rater gets a general idea of the person 
rated and then allows his general idea to affect his consideration of the 
separate traits into which he might seek to analyze teaching ability. 
In a New York City school district rating score card general intellec- 
tual capacity correlated with voice to the extent of 0.62! The conclusion 
might be drawn that score cards are at least highly wasteful of time if 
not substantially useless. 

It would seem to the reviewer that a composite of normal school 
success measured on common examinations, amount of study while 
teaching, standing on a common professional test, and salary attained 
in a common school system might be used to give a rather accurate 
composite measure of a teacher’s fitness for promotion. Unfortunately 
the author does not tell us what shall be done with the significant 
“tests.” A possible distinction between the use of them as prognostic 
tests and as measures of progress might be drawn. The number of 
high school teachers’ records investigated is too small to draw valid 
conclusions, but intelligence appears to be more important than test 
measured professional teaching ability. 

The study, not the first in the field, is an excellent first approach to 
the complicated problem of teacher measurement. 

HerBert A. Toops. 

3. An Experimental Study of Complex Learning.—Using a test 
which embraces the features of the “multiple choice’ experiments 
frequently used with animals, a ‘“‘checker puzzle” and the Tait 
Labyrinth puzzle Haught! has made an experimental and statistical 


1 Haught, B. F.: The Interrelations of Some Higher Learning Processes. 
Psychological Monographs, Princeton: The Psychological Review Co., 1921, 
pp. 71. 
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analysis of the interrelations of some of the higher learning processes 
and their association with general intelligence. Stanford-Binet 
scores are accepted as the criterion of intelligence—a somewhat ques- 
tionable practice with college students as subjects. The correlations 
of intelligence and the several learning tests are uniformly low for 
reasons not wholly clear inasmuch as the reliability of the learning tests 
are not determined. The intercorrelations of the learning tests are 
also low. Several types of scores for the learning tests are carefully 
evaluated with reference to the criterion utilized. The author believes 
that the puzzle tests afford a measure of several important mental 
abilities such as to control attention over long periods, to keep the 
goal idea in mind without confusion, to systematically analyse very 
complex situations and other features of ‘‘rational’’ learning more 
thoroughly than the Binet or other tests of intelligence. Inasmuch as 
the intercorrelations of the several tests do not fall into hierarchies 
consistent with the Spearman “general factor’”’ formula the author is 
disposed to think of intelligence not as a single power or quality but 
as ‘‘various factors variously grouped for different situations.”’ On the 
whole this is an admirable study of the higher mental processes em- 
bracing many suggestions of value to the specialist in psychology and 
mental testing. A. J. G. 





4, Research Work at Vineland.—No institution for the feebleminded 
in this country has stimulated so much psychological research as has 
The Training School at Vineland. A long list of books, monographs 
and articles have come from that source. Now Mr. Porteus adds 
another! to the list. This book represents the work he was engaged 
in during the three years of his directorship of research. Much of it 
has already appeared in various monographs and articles by the author. 
It is well, however, to have it all in one volume, even although the topics 
are very diverse. There are studies dealing with anthropometry, 
intelligence tests and rating scales. Incidentally the author presents 
still another definition of feeblemindedness, which runs as follows: 
‘“‘ A feebleminded person is one who by reason of mental defects, other 
than sensory, can not attain to self-management and self-support to 
the degree of social sufficiency.” 

The chief anthropometric contribution is a study of brain capacity. 
The author presents data from over 2000 normal cases from ages 7 to 





1 Porteus, S. D.: Studies in Mental Deviation. No. 24, Publications of the 
Training School at Vineland, Department of Research, October, 1922. 
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20. He finds an increase from age to age. He does not, however, 
raise the question of selection in the older ages above the public school, 
and it would be dangerous to accept his adult norms based on univer- 
sity students as typical of the population at large. The significance 
of the steady increase in brain capacity is, therefore, as hard to inter- 
pret as would be the steady increase in intelligence test scores which 
would be found in the population measured. 

In the chapters dealing with intelligence tests, we have further 
information about the Porteus Maze Tests, which are already well- 
known. The author emphasizes the measure of “planning capacity” 
which these tests give. They give higher correlation with industrial 
capacity and social adaptability than does the Binet. 

In the field of rating scales, we have a social rating scale and an 
industrial rating scale for the feebleminded. Both of these are rough 
and ready instruments, as the author realizes, but the work is valu- 
able and suggestive and will undoubtedly stimulate further research 
in this direction. The versatility of Mr. Porteus is further shown by 
his educational attainments scale for defectives, his form and assem- 
bling test and his revision of the Stanford-Binet Scale. 

From this brief description of the number of topics reported in the 
book, it can be readily imagined that much of the work is fragmentary 
in character, and from one point of view hardly worthy of being 
incorporated into a book. The only justification for much of it must 
be the hope that it may stimulate others to carry on where the author 
has left off. RR. 





5. Elemenis of Human Psychology.—One naturally compares this 
new work! with the well-known and still recent (1919) Human Psy- 
chology, by the same author. The purpose of the new and shorter 
text is thus set forth in the preface: ‘This book was written to meet 
numerous requests for an introductory textbook of psychology based 
on the functions of the nervous system. The standpoint is the same 
as that of ‘Human Psychology,’ which recognizes both the intro- 
spective and behavioristic methods. Material has been freely drawn 
from the earlier work, but the arrangement of topics is different and 
the treatment has been simplified. Most of the theoretical discussions 
are omitted and the practical applications of psychology are 
emphasized.”’ 





! Warren, Howard C.: ‘Elements of Human Psychology.” Boston: Houghton 
Mifflin, 1922, p. 416. 
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The abridgement, which amounts to about 15 per cent on the whole, 
affects especially the chapters on the nervous system and the senses. 
A few topics, such as memory and the subconscious, are treated at 
considerably greater length than before. The figures are increased in 
number, with the object of clearing up difficult topics. The practical 
exercises are also more numerous, and, in addition, an appendix 
provides a full set of review questions, as well as a few pages of sug- 
gestions as to the best manner of teaching and studying the subject. 
A novel feature which will be much appreciated is the expansion of the 
index into a glossary, in which most of the terms used in the text are 
carefully defined. 

The author has gone minutely through the older text, with the 
object of simplifying and illuminating every statement. Scarcely a 
sentence is taken over bodily into the new text; almost always there is 
some change in the direction of adaptation to the needs of the beginner. 
The following short passage, as it appears in the two books, gives 
some idea of the care and skill shown by the author in this difficult 
task of simplification. In the ‘‘Human Psychology” we read: 


“The inner ear or labyrinth is a very complicated cavity, only part of 


which serves the auditory function. The dorsal portion contains the 
semicircular canals and their appendages, which act as receptor for 
the static sense.”’ In the ‘‘ Elements,” this becomes: ‘“‘The inner 
ear or labyrinth is a very complicated cavity, only part of which is 
concerned with hearing. The portion toward the back of the head 
contains the semicircular canals, which are receptors for the static 
sense; they have nothing to do with hearing.” Much more extensive 
alterations than this occur constantly throughout the text. Two 
chapters in the earlier work which, taken together, outline the author’s 
system are, in the new book, combined into a compact view of the 
whole. There are frequent retrospective and anticipatory summaries, 
which serve admirably to keep the reader oriented. The style through- 
out is certainly as direct and free from unnecessary difficulties as it 
could well be made. 

As to subject-matter, the new book, like the old, is characterized 
by catholicity combined with system, When the author says that he 
“recognizes both the introspective and behavioristic methods,” he 
means more than that he is willing to accept particular conclusions 
reached by either method. He means that his system of psychology 
has a logical place for each method and for the whole positive content 
of both introspective and behavior psychology. 
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The book is essentially a system of psychology. By definition, 
‘‘Human psychology is the science which deals with the interaction 
between man and his environment by means of the nervous system and 
its terminal organs, together with the mental events which accompany 
this interplay.”” This interaction of man and his environment con- 
sists of responses to stimuli, and each response consists to three princi- 
pal parts: Reception of the stimulus by a sense organ, adjustment in 
the nerve centers, and muscular or glandular activity. The behavior- 
istic method makes its contribution by examining the muscular or 
glandular response in connection with the stimulus, but it does not 
examine the important central process of adjustment. For informa- 
tion on the process of adjustment, we turn to anatomical and physio- 
logical study of the nervous system, and also to introspection. The 
conscious experiences that are examined introspectively are corre- 
lated with the adjustment processes in the nerve centers. ‘‘In other 
words, the psychologist can study his thoughts and memories, his 
perceptions and emotions, in place of the central nerve processes which 
accompany them.” 

The central neural process of adjustment consists of several compo- 
nent processes, to each of which corresponds a fundamental mental 
process. ‘The list is as follows: 


NENURAL PROCESS MENTAL PROCESS 
Excitation Impression 
Conduction Suggestion (association) 
Retention Revival 
Fatigue (and freshness) Attention 
Collection Composition 
Distribution Discrimination 
Modification Transformation 


Now what is examined introspectively consists of experiences (or 
mental states) which are built up out of sensations by the processes 
just listed. Experiences differ because they are compounded of 
different sensations. There are three main classes of sensations: 
Those from the external senses, those from the systemic senses (organic 
and pain senses), and those from the motor senses (muscle and static 
senses). Also, there are revived sensations, which are of importance 
only in case of the external senses. That gives four main classes 
of sensory elements out of which experiences are compounded, and, 
to eorrespond, there are four fundamental classes of experiences: 
Pereeptions, composed chiefly of external sensations; images or ideas, 
eomposed chiefly of revived external sensations; feelings, composed 
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chiefly of systemic sensations; and conations, composed chiefly of 
motor sensations. There are also secondary experiences, com posed of 
systemic plus motor sensations; sentiments, composed of systemic and 
revived externa] sensations; volitions, composed of motor and revived 
external sensations. Language and thought, which belong together, 
are like volitions in being compounded of motor sensations and revived 
external sensations, but in their case the motor sensations arise in the 
vocal organs in the course of social communication. 

Frequent repetition of the same sort of experience gives rise to a 
more or less permanent set of the nerve substance, corresponding to 
which is a definite mental attitude. Perceptions and ideations set 
into permanent interest (the cognitive or receptive attitude) , feeling sets 
into desire, and conation into attention (considered here as a motor 
attitude). These are the primary attitudes, based upon the funda- 
mental classes of experience. There are also secondary attitudes, 
based upon the secondary classes of experience. Thus, emotional atti- 
tudes, or dispositions (such as cheerfulness, cowardice, malice, loyalty) 
are based upon the repetition of similar emotions, and volitional atti- 
tudes, or proclivities (as perseverance and vacillation), are based upon 
repeated volitions. The repetition of similar intellectual processes 
generates such attitudes as the retrospective, the imaginative, the 
judicial, the analytic, and many others. 

“Character arises from the consolidation of attitudes into more 
permanent trends of life.”” The consolidation of the intellectual atti- 
tudes gives the intellectual phase of character, the feeling attitudes 
compose temperament, the motor attitudes compose skill, and the 
social attitudes constitute morality. These are the four phases of 
character, and their summation constitutes the personality, ‘‘the 
entire mental organization of a human being at any stage of his 
development.”” Thus personality is built up from elementary sensa- 
tions by a continued summation; and, in the same way, on the side of 
motor response, there is a continued process of organization, from 
reflexes through instinctive acts, learned performances (intelligent 
behavior), and rational action, up to personal control of the entire 
situation with which one is concerned. 

As the preceding summary dimly suggests, the book represents a 
very determined effort at systematization. It is rather a remark- 
able performance in that line. In the present immature state of our 
science, to be sure, any thoroughgoing system of psychology is bound 
to be somewhat personal and arbitrary, and open to cheap and easy 
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criticism. For example, the author attempts to combine structura 
and functional psychology in his system. He first defines the different 
classes of experience in purely structural terms, as, a perception is a 
compound of external sensations. But, as we advance into the chap- 
ter on perception, we begin to read of the perception of objects, of weight, 
of spatial relations. Here then, we are considering perception as a 
function which accomplishes certain results. Does the original 
structural definition of perception still hold good? No, for it develops 
that motor sensations are as important as any in the perception of 
weight, size, etc. The structural definition does not hold strictly, if at 
all, as soon as we begin to think in terms of function. Volition, 
similarly, is first defined in a purely structural way, as an experience 
compounded of motor sensations and revived external sensations. 
But another, functional definition is also given, according to which 
volition is ‘“‘the kind of experience which accompanies ideomotor 
actions.” According to the structural definition, I should experience a 
volition if I chanced to have a visual image while walking or even 
while being passively rotated; but, according to the functional defini- 
tion, this experience would not be a volition, because the motor 
sensations in question are not produced by any motor effects of the 
visual image. 

Regarded as a serious attempt to show that structural psychology 
can be taken over bodily into a psychology that is primarily a study of 
certain functions of the organism, Warren’s system thus leaves con- 
siderable room for doubt. As a text for a discussion group, where the 
stress is to be laid upon careful definition, scrutiny of implications, and 
logical system, the present book should serve admirably. Nor is it 
lacking in informational value, nor in practical hints. There are many 
judicious educational suggestions scattered throughout the book. 

R. S. Woopwortu. 
Columbia University. 

6. The Technique of Curriculum Construction.—This thorough- 
going treatise! falls into two parts, the first, a statement of principles, 
with their elaboration into procedures; the second, a compilation of 56 
studies from various sources, illustrating eight school subjects and 
miscellaneous fields which cannot be included under the first classifica- 
tion. The book should not only fill a need as a textbook for graduate 


1Charters, W. W.: ‘Curriculum Construction.”” New York: MacMillan, 
1923, pp. XII + 352. 
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classes. Part I is a handbook of investigational techniques, and is 
permeated with a philosophy which every curriculum worker will do 


well to consider in critical comparison with his own or the one to which 
he subscribes. 





7. A Brief for the Pre-school Child.—The biological and psycholog- 
ical significance of the first five years of life are clearly set forth in the 
opening chapter of this book.! The history of infant and child welfare 
work and of the nursery school movement is traced in its relation tothe 
home, the kindergarten and school entrance. 

Educational provision for handicapped children below school age 
is advocated both as a preventive and as a corrective measure. The 
book is full of constructive suggestions for the conservation of child- 
hood, the improvement of parental care and educational preparation 
for parenthood. The appendix contains a selected bibliography and 
a wealth of other pertinent data. This book is not a report of Dr. 
Gesell’s research project at the Yale Psycho-Clinic. It is a text 
comparable with Terman’s book, ‘‘The Hygiene of the School Child.” 
Its purpose is to indicate the vital interdependence between pre-school 
and pre-parental education. 

8. A Further Report on the Social Studies in the North Central 
Association.—This bulletin? supplements and brings down to date the 
earlier reports by L. V. Koos and C. O. Davis on the teaching of history 
and citizenship in the secondary schools of the Middle West. (‘‘The 
Administration of Secondary Units.’”’ The University of Chicago 
Press, 1917 and Training for Citizenship in the North Central Second- 
ary Schools. The School Review, Vol. 28, pp. 263-282.) This Illinois 
bulletin gives one a representative sampling of what social sciences are 
now being taught, of the time allotment, the scope and content of such 
courses, the textbooks used and some indications of the methods of 
instruction that are followed. As the authors state, ‘‘ No effort has 
been made to interpret the facts; but studied with the two earlier 
reports, one may learn a great deal concerning what the North Central 
Association High Schools have been teaching in the field of the social 
sciences for the past seven or eight years. EARLE Ruaa. 





1 Gesell, Arnold: ‘‘The Pre-school Child.” New York: Houghton Mifflin, 
1923, pp. XV + 264. 

2 Monroe, W. S. and Foster, I. O.: The Status of the Social Sciences in the 
High Schools of the North Central Association. Bulletin 13, Bureau of Educa- 
tional Research, University of Illinois, Urbana, IIl., 1923. 
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Brier Notices or OTHER PUBLICATIONS RECEIVED 


(Because this number is the last issue before the summer recess the brief 
mention given below was deemed preferable to postponed review.) 

1. A Study of the Rise of Civilizations.—This is first serious attempt 
to analyze and relate the processes of human culture in systematic 
fashion.! Students of sociology and ethnology will find the compara- 
tive study of racial types facilitated by the illuminating organization 
and presentation of data. 

2. The Measurement of Emotion.—An experimental study’ of the 
types of verbal reactions made to words presented orally by the associa- 
tion method. The reaction time and the galvanometric reflexes are 
recorded. The author finds evidence in support of Jung’s hypothesis 
that the association method is a useful device for uncovering emotional 
complexes. He also develops a theory concerning the facilitation 
and inhibiting effects of emotions on recall. 





3. The Accomplishment Ratio.—An account of the development and 
theoretical basis of the accomplishment ratio together with the results 
of its use in an elementary school. The author* believes that there is 
little specialization among school subjects and that all are essentially 
perfectly correlated with general intelligence under favorable methods 


of instructions. 
A. &.. <. 





4. A Study of Questions.—This‘ is a brief report of investigations 
into the frequency with which various types of questions are used in 
secondary schools, and the character of the question as a specific 
stimulus to mental activity. Twenty types of questions are considered 
and the common faults of procedure in answering each type is related 
to the shortcomings in the mental processes which lead to replies. 


1 Wissler, Clark: ‘‘Man and Culture.” New York: Crowell, 1923, pp. 371. 

? Smith, Whately W.: “The Measurement of Emotion.”” New York: Harcourt, 
Brace and Co., 1922, pp. 184. 

* Franzen, Raymond: “The Accomplishment Ratio.’’” New York: Teachers 
College, 1922, pp. 59. 

‘Monroe, Walter S. and Carter, Ralph E.: The Use of Different Types of 
Thought Questions in Secondary Schools and Their Relative Difficulty for Stu- 
dents. Bulletin No. 34, Vol. XX, University of Illinois, Urbana, Illinois, 1923, 
pp. 26. 
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5. A Type in Job Analysis —This book! is based on researches in 
the field of commercial printing. It outlines in detail the method of 
gathering data and developing a curriculum for the training of printing 
executives. It should have a much broader appeal than this would 
seem to indicate in view of the fact that numerous other problems of 
personnel and curriculum could be solved by somewhat similar 
procedures. 





6. Four New Drawing Scales.—This bulletin? contains a brief 
summary of the data and technique used in the construction of four 
scales for representative drawing. The four scales cover four types of 
free-hand drawing. 





7. Educational Implications of Mental Hygiene-—Educators and 
laymen who cannot accept the Freudian analysis of mental disorders 
will find this volume? helpful in the diagnosis and treatment of certain 
common types of psychic disorder. 





8. Mental Efficiency and Tobacco.—This volume‘ presents data 
derived from observation, laboratory tests, introspection and biography. 
The work is the first to be published in the name of a committee 
organized in 1918 to study the tobacco problem. 





9. The Basis of Social Behavior —After defining the scope of the 
book the author® discusses in succession the following significant 
phases of the subject. The sense of social unity, social motives, 
intellectual levels and psychic stability, racial factors, suggestibility, 
the crowd, convention, custom and morale, social progress and adjust- 
ment. The work is built on the latest researches in social science and 
the related sciences. It is written in a style which will appeal to 
student and general reader alike. 


~_ 





1 Strong, Edward K., Jr. and Uhrbrock, Richard 8.: ‘Job Analysis and the 
Curriculum.” Baltimore: Williams and Wiikins, 1923, pp. 146. 

2 Kline, Sinns W. and Carey, Gertrude L.: The Revised Kline-Carey Meas- 
uring Scale for Free-hand Drawing. Part I, Representation. No. 5a. The 
Johns Hopkins University Studies in Education. Baltimore: The Johns Hop- 
kins Press, 1923, pp. 10 + 4 scales. 

’ Bousfield, Paul: “The Omnipotent Self. A Study in Self-deception and 
Self-cure.”” New York: E. P. Dutton and Company, 1923, . VII + 183. 

* O’Shea, M. V.: “Tobacco and Mental Efficiency.”” New York: MacMillan, 
1923, pp. 258. 

> Gault, Robert H.: “Social Psychology.’’ New York: Holt, 1923, pp. 336. 











